Updating the RNA polymerase CTD code: adding gene-specific layers

Updating the RNA polymerase CTD code: adding gene-specific layers

Review Updating the RNA polymerase CTD code: adding gene-specific layers Sylvain Egloff1,2, Martin Dienstbier3 and Shona Murphy3 1 Universite´ de To...

1MB Sizes 0 Downloads 85 Views

Review

Updating the RNA polymerase CTD code: adding gene-specific layers Sylvain Egloff1,2, Martin Dienstbier3 and Shona Murphy3 1

Universite´ de Toulouse, UPS, Laboratoire de Biologie Mole´culaire Eucaryote, F-31000 Toulouse, France CNRS, LBME, F-31000 Toulouse, France 3 Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford, OX1 3RE, UK 2

The carboxyl-terminal domain (CTD) of RNA polymerase (pol) II comprises multiple tandem repeats with the consensus sequence Tyr1-Ser2-Pro3-Thr4-Ser5-Pro6-Ser7 that can be extensively and reversibly modified in vivo. CTD modifications orchestrate the interplay between transcription and processing of mRNA. Although phosphorylation of Ser2 (Ser2P) and Ser5 (Ser5P) residues has been described as being essential for the expression of most pol II-transcribed genes, recent findings highlight gene-specific effects of newly discovered CTD modifications. Here, we incorporate these latest findings in an updated review of the currently known elements that contribute to the CTD code and how it is recognized by proteins involved in transcription and RNA maturation. As modification of the CTD has a major impact on gene expression, a better understanding of the CTD code is integral to the understanding of how gene expression is regulated. The CTD code In eukaryotes, the synthesis of all mRNAs and some small nuclear and nucleolar (sn/sno)RNAs is achieved by pol II. Rpb1, the largest pol II subunit, contains a highly flexible structure at its C terminus. This CTD is unique to eukaryotic organisms and comprises multiple tandemly repeated heptapeptides with the consensus sequence Tyr-Ser-ProThr-Ser-Pro-Ser (Y1S2P3T4S5P6S7). The number of repeats varies from organism to organism and appears to correlate with genomic complexity, from 26 (all consensus) repeats in the yeast Saccharomyces cerevisiae to 52 (21 consensus and 31 non-consensus) repeats in the mammalian CTD [1,2] (Figures 1 and 2). Strikingly, the amino acids of the consensus repeat can all be potentially modified independently of one another. The serine, tyrosine and threonine residues can be phosphorylated in vivo, and cis-trans isomerization of the two proline residues can also occur [3]. The ability of the CTD to be modified at each residue can generate a wide range of distinct combinations (Figure 1), and each combination could contain information that is crucial at different steps of the transcription cycle. Therefore, the CTD can be seen as a dynamic scaffold continuously signalling between the transcription machinery and factors that interact with pol II and/or newly

Corresponding author: Murphy, S. ([email protected]). Keywords: carboxyl-terminal domain; transcription; RNA processing; RNA polymerase II.

synthesized RNA; in addition, the different combinations of modifications have been likened to a readable code [3–5]. The CTD is dispensable for polymerase activity in vitro, and its partial deletion can be tolerated [6,7]; however, deletion of the entire CTD in mice, Drosophila or yeast is lethal [3]. It is now well established that the CTD plays a direct and major role in coupling transcription with cotranscriptional nuclear processes, such as chromatin modification and RNA processing [8]. The CTD is also implicated in the recruitment of factors involved in transcription-coupled genome stability and RNA export [9–11]. Interaction of factors with the CTD is largely determined by the modification state of the CTD heptapeptides. Accordingly, CTD modification undergoes dramatic changes during transcription to recruit factors at the appropriate point of the transcription cycle. Recent studies have identified new post-translational modifications of the CTD repeats, which further expand the complexity of the code. In addition, new CTD-binding factors have been discovered, and there are new insights into the binding requirements of previously described CTD-binding factors. Here, we provide an updated view of the current knowledge of the ‘writers’ of the CTD code, and how this is interpreted by the ‘readers’. This new information further emphasizes the impact that CTD modification has on gene expression and that modifications can have gene-specific effects. Components of the code Dynamic phosphorylation of the three serine residues is by far the best-characterized CTD modification and has been shown to be directly responsible for the binding and release of a range of pol II-associated factors. Serine 5 phosphorylation (Ser5P) and Ser2P appear to be components of the code required for expression of all gene types, whereas Ser7P displays gene-specific features. In addition, the recent discovery of the in vivo phosphorylation of Thr4, which has a gene-specific function, underlines the increasing complexity of the CTD phosphorylation code (Figure 1). Other known modifications include glycosylation of the serines and phosphorylation of Tyr1, but no clear function has yet been attributed to these marks [3]; neither have they been shown to influence CTD recognition by factors. However, the conformation of the two invariant prolines in either a cis or trans orientation is important for the binding of some factors and can be considered a key element of the code (Figure 1).

0168-9525/$ – see front matter ß 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2012.03.007 Trends in Genetics, July 2012, Vol. 28, No. 7

333

Review

Trends in Genetics July 2012, Vol. 28, No. 7

YSPTSPS

none

YSPTSPS

S2

YSPTSPS

T4

YSPTSPS

S5

YSPTSPS

S7

YSPTSPS

S2, T4

YSPTSPS

S2, S5

YSPTSPS

S2, S7

YSPTSPS

T4, S5

YSPTSPS

T4, S7

YSPTSPS

S5, S7

YSPTSPS

S2, T4, S5

YSPTSPS

S2, T4, S7

YSPTSPS

S2, S5, S7

YSPTSPS

T4, S5, S7

YSPTSPS

S2, T4, S5, S7

Proline isomerization state

X

4 combinations

16 combinations

Phosphorylation state

YSPTSPS c c

cis, cis

YSPTSPS c t

cis, trans

YSPTSPS t c

trans, cis

YSPTSPS t t

trans, trans

X 52 repeats in mammals [minus the changes in the non-consensus repeats (Figure 2)] X 26 repeats in yeast

TRENDS in Genetics

Figure 1. Potential modifications of the RNA polymerase (pol) II carboxyl-terminal domain (CTD) heptapeptide during transcription. All the possible serine phosphorylation and proline isomerization combinations are shown. It has not been ruled out that Tyr1 phosphorylation coexists with serine and Thr4 phosphorylation. In addition, the glycosylation state of Ser2, Thr4, Ser5 and Ser7 may also play a role in recognition of the CTD by factors, and multiple differentially glycosylated forms are possible. However, because glycosylation and phosphorylation are mutually exclusive, the main role of glycosylation may be to block phosphorylation of polymerases not engaged in transcription [70,71]. Both off and on DNA, subsets of these combinations will influence the function of pol II. Insertion of an extra amino acid between every pair of heptapeptide repeats is tolerated, whereas insertion between each single repeat is not, suggesting that the unit of recognition is two CTD heptapeptides [72] (Figure 3). This would increase the potential complexity of the CTD code (256 potential combinations for phosphorylation sites and 16 for proline isomerization) but reduce the number of different protein-binding sites on the same CTD. However, it has been argued that SPSYSPT is the functional repeat rather than the conventionally used YSPTSPS [1]. Although the code is potentially complex, with each of 52 repeats or 26 pairs having a different set of modifications, in reality the combinations will be restricted by the recruitment of different sets of modification enzymes at the appropriate points. In addition, some modifications will enhance or preclude others. See also [4,5]. Phosphorylation is indicated by red circles and trans or cis isomerization of prolines by a blue t or c, respectively, below the amino acid.

In addition to 21 consensus repeats, the mammalian CTD contains 31, non-consensus repeats, mainly located in the C-terminal region, which may have specific roles in CTD function (Figure 2a). For example, the single Arg residue in position 7 of repeat 31 (R1810) can be methylated during transcription [12] and the Lys residues in eight of these repeats are potential substrates for acetylation, methylation, sumoylation and ubiquitylation (Figure 2b,c). However, it is striking that the non-consensus repeats are still very similar to the consensus repeats. For example, positions 1 and 6 are identical in all repeats, position 2 mainly switches between Ser and Thr, and positions 3 and 5 each vary in only one repeat. A complicated interplay between CTD kinases and CTD phosphatases generates a characteristic pattern of serine phosphorylation on the CTD as pol II moves along the genes (Figure 3). Analysis of a limited number of mammalian protein-coding genes indicates that, in general, the pattern of CTD phosphorylation, relative to the total levels of pol II, changes as transcription progresses; high levels of 334

Ser5 phosphorylation at the beginning of the transcription cycle are reduced towards the end, whereas Ser2 and Ser7 phosphorylation are higher towards the end of the transcription unit [13–17]. The phosphorylation pattern on mammalian snRNA genes differs in some respects, suggesting that specific gene types have different CTD modification patterns [14] (Box 1). Extensive analysis of CTD phosphorylation patterns in the yeasts S. cerevisiae and Schizosaccharomyces pombe instead suggests uniform phosphorylation patterns of Ser2, Ser5 and Ser7 phosphorylation across genes, with changes occurring at set distances from the transcription start site [18–21]. Writers of the code CTD kinases old and new The CDK7 subunit of the TFIIH pre-initiation factor in higher eukaryotes and its homologs Kin28 in S. cerevisiae and Mcs6 in S. pombe (Table 1) are thought to be responsible for most Ser5 phosphorylation early during the transcription cycle. Accordingly, specific inhibition of these enzymes causes a drastic reduction of Ser5 phosphorylation in vivo [17,22–26]. CDK8, which, similar to CDK7, is recruited to promoters, phosphorylates Ser2 and Ser5 in vitro. However, its in vivo contribution to CTD phosphorylation remains unclear [27]. The situation with Ser2 phosphorylation is more complicated; in S. cerevisiae, two Ser2 kinases that are essential for normal growth have been identified, Bur1 and Ctk1. Most Ser2P on elongating RNA pol II appears to be catalyzed by Ctk1 [8,28], whereas Bur1 was first thought to stimulate elongation and suppress aberrant initiation of transcription by acting on non-CTD substrates, such as the elongation factor Spt5 [29]. However, the situation might be more complex as recent studies suggest that Bur1 can also phosphorylate Ser2 of the CTD [28,30,31] and stimulate subsequent Ser2 phosphorylation by Ctk1 [28]. S. pombe has two Ser2 kinases equivalent to Ctk1 and Bur1; CDK9 and Lsk1 [24] (Table 1). In metazoans, Cdk9, the kinase subunit of the positive transcription elongation factor b (P-TEFb), phosphorylates both Spt5 and Ser2 [32]. This dual functionality of Cdk9 led to the assumption that PTEFb recapitulates the activities of both Bur1/CDK9 and Ctk1/Lsk1. However, recent studies in Drosophila and human cells have identified two additional Ser2 kinases, Cdk12 and Cdk13 [33–35]. These display CTD kinase activity in vitro, are associated with cyclin K (CycK) and have a conserved CTD kinase domain. Importantly, Cdk12 is required for most Ser2 phosphorylation in vivo and is associated with elongating pol II, leading to the conclusion that this is the metazoan ortholog of Ctk1. Knockdown of this CTD kinase affects expression of a range of human genes, including the DNA-damage response (DDR) genes [33]. However, the function of Cdk13 remains obscure. The identification of these new metazoan Ser2 kinases suggests that the division of function in yeast between Bur1- and Ctk1-type kinases has been retained in higher organisms. Identification of Ser7 kinases Ser7 of the heptapeptide repeat is also phosphorylated during transcription of snRNA and protein-coding genes

Review

Trends in Genetics July 2012, Vol. 28, No. 7

Non-consensus repeats in the human CTD (31) 0

(a)

5

10

15

20

25

30

35

40

45

50

Key:

Consensus Non-consensus Site-specific modification (R1810)

(35, 39, 40, 47, 49) YSPTSPK

(b)

(41, 43, 46, 48, 51) YSPTSPT (6, 22, 23, 26) YSPTSPN

Key:

(1) YSPTSPA

Known modification

(44) YSPTSPV (50) YSPTSPG

Potential new modifications

(24,36) YTPTSPS

K : Acetylation/methylation/sumoylation/ubiquitination T : Phosphorylation/glycosylation S : Phosphorylation/glycosylation R : Methylation

(34) YSPSSPS (38, 45) YTPTSPK (27) YTPTSPN (33) YTPSSPS (3) YTPQSPS (37) YSPSSPE (31) YSPSSPR (42) YSPTTPK (52) YSLTSPA (32) YTPQSPT (2) YEPRSPG

(c) E

1 Y

T

L

2

3

S

P

S

RV E

R

T

4

5

6

T

S

P

Q

A G

N

7

S K

T TRENDS in Genetics

Figure 2. The non-consensus repeats in the mammalian carboxyl-terminal domain (CTD) could be further modified. Although the only modification of the non-consensus repeats to be shown so far is methylation of R1810 [12], there are many other potential non-consensus repeat-specific modifications that could influence CTD function. (a) Schematic of the human CTD. Consensus repeats (in purple) are found in the N-terminal region of the CTD, whereas non-consensus repeats (in blue) mainly localize to the C-terminal region. The non-consensus repeat containing R1810 is indicated as a striped box. The repeat number is counted from the N terminus and is indicated above the repeats. (b) The non-consensus repeats in the human CTD and their potential modifications. The numbers in parentheses refer to the number of each repeats as in (a). (c) Schematic representation of amino-acid abundance at each position of the non-consensus heptapeptide repeats.

in human and yeast cells [13,19,20,26,31,36]. Mutation of Ser7 to alanine in all repeats is not lethal in fission yeast [37] and has little effect on the expression of protein-coding genes in mammals [13,36]. However, this mutation causes a marked defect in transcription of human snRNA genes and 30 processing of the transcripts [36]. This was the first indication that elements of the CTD code could play genespecific roles. Unexpectedly, Cdk7/Kin28, the kinase responsible for phosphorylating Ser5, was also identified as being crucial for Ser7 phosphorylation in yeast and human cells [17,23,26,31,38]. Accordingly, Ser7P profiles at the beginning of snRNA and protein-coding genes generally resemble those of Ser5P, and chemical inactivation of Cdk7 drastically reduces both Ser5 and Ser7 phosphorylation [26]. However, Cdk7 is not the only Ser7 kinase. Whereas Ser5P and Ser7P precipitously decrease at the 50 end of genes after inactivation of Kin28, the level of Ser7P in coding regions remains unaffected [31]. Inactivation of the yeast Bur1 kinase reduces the levels of Ser7P within coding regions [31] and Cdk9, the Bur1 homolog in humans, can phosphorylate Ser7 in vitro [17]. Phosphorylation of Thr4; a new gene-specific mark? In addition to the three serine residues, Thr4 of the CTD consensus repeat has recently been shown to be

phosphorylated in vivo [39]. Although mutation of Thr4 in all repeats to alanine is not lethal in yeast [37], this new CTD mark is crucial for processing, but not transcription, of the intron-less replication-activated histone genes in chicken [39]. Importantly, expression of other protein-coding genes or non-coding RNA genes remains unaffected by Thr4 mutation. Thus, similar to Ser7P for snRNA genes [36], Thr4P plays a highly specific role in facilitating efficient expression of a specialized group of intronless proteincoding genes. DRB and flavopiridol, which both inhibit Cdk9, inhibit Thr4 phosphorylation, suggesting that this mark is Cdk9 dependent [39]. In line with these results, knockdown of Cdk9 impairs recruitment of the replicationactivated histone gene-specific RNA processing factor SLBP to histone genes and leads to an accumulation of polyadenylated mRNA [40]. Analysis of the distribution of the Thr4P mark on the CTD of elongating pol II awaits the development of a ChIP-able Thr4P antibody. New phosphatases and a novel function for an old phosphatase Several enzymes have been implicated in the removal of phosphates from the CTD at specific points of the transcription cycle, leading to the classic profile of high Ser5P at the beginning of genes replaced later by high Ser2P [3] 335

Review

Trends in Genetics July 2012, Vol. 28, No. 7

Promoter 5′

3′

Transcription cycle Initiation

Kin28 Kin28

Kinases

RNA 3′end formation termination

Elongation

Key:

Bur1 Bur1

Ser5P

Ctk1

Ser7P Ser2P

Yeast Serine phosphorylation level

Rtr1

Phosphatases

Initiation

Kinases

Early elongation

Cdk7 Cdk7

Fcp1 Ssu72

Ssu72

Productive elongation

Cdk12? Cdk9

RNA 3′end formation termination

Cdk9? Cdk12? Cdk9 Mammals

Serine phosphorylation level

Phosphatases

RPAP2 Scp1 Ssu72?

Fcp1 Ssu72? TRENDS in Genetics

Figure 3. The carboxyl-terminal domain (CTD) phosphorylation pattern across yeast and human protein-coding genes. The average level of Ser2P, Ser5P and Ser7P relative to the total RNA polymerase (pol) II level detected on protein-coding genes during the different steps of the transcription cycle is represented for yeast and mammals. The kinases and phosphatases responsible for establishing these patterns are noted above and below, respectively. ? denotes that an enzyme has not yet been clearly demonstrated to have this function. The figure is based on the results from many different studies [13–21,23,31,33,43,45,49] and is only intended to give a general view of the changes in phosphorylation patterns and where kinases and phosphatases are active across protein-coding genes, rather than an accurate representation.

(Figure 3, Table 1). Both Ssu72 and the small CTD phosphatases, including SCP1, can remove the phosphate from Ser5P in vivo [3]. SCP1 is recruited to promoters specifically to regulate expression of neuronal genes in nonneuronal cells [41], but its role in general transcription is unclear. The Ser5P phosphatase Ssu72 dephosphorylates Ser5P in yeast [42]. However, a second Ser5P phosphatase, Rtr1, has recently been identified in yeast [43]. Rtr1 is an atypical phosphatase, as it contains no known functional phosphatase motif. Rtr1 is found associated with genes at the point where Ser5p levels drop and Ser2p levels start to rise (Figure 3), and its inactivation induces higher Ser5P levels across the coding region. The existence of Ssu72 probably explains why deletion of Rtr1 is not lethal in yeast. The redundancy of these two phosphatases 336

and their respective role in Ser5P dephosphorylation during transcription requires further investigation. Interestingly, Ssu72 has recently been shown to also remove the phosphates from Ser7P [21]. The human homolog of Rtr1, RNA pol II-associated protein 2 (RPAP2), was first discovered as a protein closely associated with the pol II machinery [44]. RPAP2 also has Ser5 phosphatase activity, and its knockdown results in increased Ser5P levels on transcribing pol II [45]. RPAP2 is detected at the 50 end of both protein-coding genes (Figure 3) and snRNA genes and is essential for efficient transcription and 30 end-processing of snRNA transcripts. The levels of Ser2P are modulated at the end of the transcription unit by the Ser2 phosphatase Fcp1 [18,42], making RNA pol II available for the next round of

Review

Trends in Genetics July 2012, Vol. 28, No. 7

Box 1. CTD serine phosphorylation on mammalian protein-coding and non-coding genes Protein-coding genes represent most pol II-transcribed genes. In mammals, genes for non-coding snRNAs, which are short (a few hundred bps, whereas protein-coding genes can be kbs) and have a different promoter structure and RNA processing signals are also transcribed by pol II [68]. snRNA gene-specific transcription and processing factors have been identified [68]. Because the CTD plays a key role in recruiting factors to the transcription machinery, one could predict differences in the CTD modification pattern between the two types of pol II-transcribed genes. Indeed, Ser7 phosphorylation peaks at the beginning of snRNA genes and at the end of protein-coding genes (Figure 3), and the level of phosphorylation, relative to pol II, is lower for all three serines on snRNA genes, with Ser2P being particularly low [14,17]. This might be the consequence of the short length of these genes because Ser2P generally occurs later in the transcription cycle. During transcription of protein-coding genes, Ser2P is implicated in elongation and can activate splicing and 30 end processing and/or termination. Because the snRNA genes are intronless, the requirement for a high level of Ser2P might be bypassed. CTD phosphorylation might also be lower because of the higher level of actively engaged pol II on snRNA genes [14]. Ser7P specifically recruits the RPAP2 Ser5P phosphatase and the RNA 30 end

transcription. Interestingly, it has recently been shown that the mitotic phosphatase Cdc14 removes phosphates from Ser2P and Ser5P to repress transcription during mitosis [46]. Localization of Ser2P, Ser5P and Ser7P Is the code written uniformly on all pol II-transcribed genes or are there gene-specific patterns? To explore the ‘universality’ of the code, genome-wide localization of the three CTD marks (Ser2P, Ser5P and Ser7P) has been performed on the genomes of S. cerevisiae and S. pombe [18–20,31]. The current paradigm was confirmed by the finding that Ser5P and Ser2P are reciprocally enriched at promoters and 30 end of genes, respectively (Figure 3). However, the Ser5P mark is also readily detected at the 30 end of some genes [18,31]. Along with Ser5P, the Ser7P mark is placed early during the transcription cycle, as would be expected if Cdk7 phosphorylates both residues. However, contrary to Ser5P, Ser7P persists at robust levels until transcription termination at all pol II-transcribed genes in yeast [31]. Importantly, RPAP2/Rtr1, the phospho-Ser5 phosphatase, has no effect on Ser7P [23,45]. Thus, dephosphorylation of Ser5P early during the transcription cycle, but not Ser7P, by Rtr1 could explain the differential profiles observed between Ser5P and Ser7P. However, specific inhibition indicates that distal Ser7P marks are dependent on the activity of Bur1, which suggests that Ser7P has an important function in transcription elongation [31]. In line with this possibility, high levels of Ser7P are detected on highly transcribed genes [19,31]. Importantly, distinct patterns of CTD phosphorylation are observed at non-coding versus protein-coding genes (Box 1). The level of Ser2P is markedly lower on non-coding genes. By contrast, Ser7P is equivalent to, or higher on, non-coding genes than on protein-coding genes [19,31]. Thus, a low level of Ser2P and an abundance of Ser7P at non-coding genes could serve as a CTD gene-type specific signal. Interestingly, there is a gene-specific requirement for Lsk1-mediated CTD Ser2 phosphorylation for induction of expression of the Ste11 gene, which encodes a protein

processing Integrator complex to snRNA genes, ensuring proper transcription and processing of transcripts [45]. By contrast, mutation of Ser7 has no obvious effect on protein-coding gene expression [13]. Binding of the Int11 catalytic subunit of Integrator is dependent on both Ser7P and Ser2P [69], indicating that low levels of Ser2P are sufficient for Int11 recruitment and proper processing of snRNA. Recruitment of RPAP2 to snRNA genes by Ser7P early during the transcription cycle may be responsible for the sharp drop in Ser5P levels seen in these genes soon after initiation [14]. In yeast, a lower level of Ser2P is also observed at the short genes for non-coding snoRNAs compared with protein-coding genes [19,31]. Interestingly, whereas termination factors require Ser2P to be recruited to protein-coding genes, the snoRNA gene-specific termination factor, Nrd1, recognizes Ser5P [61]. In contrast to the low level of Ser2P, Ser7P occurs at high levels on non-coding RNA genes [19,31]. However, it is not known whether this is a positive mark to recruit gene-specific factors. Finally, a recent study highlighted a gene-specific effect of the methylation of R1810 within the mammalian non-consensus CTD repeats [12]. This modification restricts transcription of snRNA genes, without affecting protein-coding gene expression.

that promotes meiosis [47]. However, this requirement can be bypassed by mutating all Ser7 residues [37], emphasizing that the balance between the different modifications is important. Interestingly, Ser5P was recently shown to prime the CTD for subsequent phosphorylation of Ser2 [24,28], indicating that crosstalk contributes to the final pattern. In many higher eukaryotic genes, pol II stalls after transcribing 50–100 base pairs (bp) at an early elongation checkpoint owing to the negative elongation factors NELF and DSIF. Phosphorylation of these factors by P-TEFb plays a major role in overcoming this block to productive elongation [48]. Although phosphorylation of Ser2 of the pol II CTD occurs at the same time, the role of this modification in elongation is not clear [48]. However, there is good evidence that Ser2P can influence splicing and polyadenylation [3]. In line with this function, phosphorylation of Ser2 generally appears downstream from the promoter and increases towards the 30 end of genes [13–16,49] (Figure 3). As found for the generally much longer protein-coding genes, Ser5P and Ser2P peak at the beginning and end of the very short human snRNA genes [14]. However, phosphorylation levels and the pattern of Ser7 differ [14,17]. These differences are probably related to the differences in the mechanics of transcription elongation and RNA processing between the two gene types (see above; Box 1). The role of proline isomerization The peptidyl proline isomerases Ess1 in yeast and Pin 1 in mammals can isomerize the prolines at positions 3 and 6 of the CTD repeat [3]. The polyadenylation/termination factor, Pcf11, binds exclusively to repeats with Ser2P and prolines in the trans configuration [50], which is the dominant isomer in equilibrium [51]. By contrast, the Ssu72 Ser5P phosphatase recognizes repeats with Ser5P and the downstream proline in the cis configuration [51], and it has been demonstrated that Ess1 and Pin1 activate Ssu72 to promote Ser5P dephosphorylation [51–53]. In addition, Cdc2/cyclin B hyperphosphorylates the CTD in M phase in a Pin1-dependent manner [54]. 337

Review

Trends in Genetics July 2012, Vol. 28, No. 7

Table 1. CTD kinases and phosphatases in yeast and mammals Ser2 Ser5 Ser7 Thr4

Kinase Phosphatase Kinase Phosphatase Kinase Phosphatase Kinase

Mammals a Cdk9, Cdk12 (Cdk13?) Fcp1, Cdc14 Cdk7, Cdk8 RPAP2, Scp1, Ssu72?, Cdc14 Cdk7, Cdk9 Ssu72? Cdk9?

Sacharomyces cerevisiae Bur1, Ctk1 Fcp1 Kin28, Cdk8 (Srb10) Rtr1, Ssu72 Kin28, Bur1 Ssu72 ?

Schizosaccharomyces pombe Cdk9, Lsk1 Fcp1 Mcs6, Cdk8 Rtr1? Mcs6? ? ?

Refs [24,32–35] [46] [17,22–25,27] [42,43,45,46] [17,26,31,38] [21] [39]

a

? denotes either that an enzyme has not yet been identified or that the suggested enzyme has not yet been unequivocally demonstrated to have this function.

Non-consensus repeats can also be modified The presence of non-consensus repeats in the distal part of the mammalian CTD might reflect additional regulatory mechanisms not present in yeast. Indeed, 31 of the 52 repeats that comprise the human CTD differ from the consensus sequence at one or more positions, and variations predominantly occur in position 7 [1] (Figure 2). One of these non-consensus repeats (repeat 31) contains an arginine instead of a serine in position 7. This residue (R1810) has been shown to be methylated by coactivatorassociated arginine methyltransferase 1 (CARM1) [12]. Ser2P or Ser5P inhibits CARM1 activity at this site in vitro, suggesting that methylation occurs before these residues get phosphorylated. However, R1810 methylation is probably preserved on the transcribing polymerase, as this modification is detected on phosphorylated pol II. Importantly, substitution of R1810 by alanine results in the mis-expression of a variety of snRNAs and snoRNAs, whereas the expression of protein-coding genes is unaffected. Contrary to the inhibitory effect on snRNA gene expression of the Ser7 mutation [36], expression of snRNA and snoRNA were upregulated when R1810 was mutated to alanine, suggesting a repressive rather than activating function for this mark. Therefore, methylation of R1810 by CARM1 regulates the expression of a subclass of RNAs, further expanding the gene-specific functions associated with the CTD. How methylation of R1810 specifically restricts expression of snRNA and snoRNA is currently unknown. The Tudor-domain-containing protein TDRD3 binds specifically to this modification [12]. However, this protein does not appear to mediate regulation of snRNA or snoRNA gene expression. Readers of the code Several studies have identified proteins that bind specifically to either unphosphorylated CTD repeats or repeats with distinct patterns of modification [3,55,56]. In some cases, the CTD modification requirements for interaction have been well characterized and, in general, there is good correspondence between the point of the transcription cycle where modifications appear and the recruitment and/or requirement for the binding factor (Figure 4). The Ser5-phosphorylated CTD is required for the recruitment of the guanylyl transferase (GTase) responsible for adding the cap structure to the 50 end of nascent transcripts. Crystallographic studies of a CTD phosphopeptide bound to murine GTase have revealed that the interaction relies almost exclusively on contacts with Ser5P in one repeat and Tyr1 of the next repeat [57]. In S. pombe, artificial delivery of the capping enzyme to the 338

transcription machinery by covalently tethering the capping enzyme to the mutated CTD can overcome Ser5 mutation to Ala [37]. In addition, specific inhibition of Kin28 has little effect on transcription of protein-coding genes, but causes a striking reduction of capping [22,25]. These results suggest that the main function of Ser5P is to recruit the capping enzyme. However, additional proteins also require the Ser5P mark for recruitment, including the Set1/COMPASS histone methyltransferase responsible for trimethylation of H3K4 at the 50 end of transcribed genes [58], the Rtr1 and Scp1 Ser5P phosphatases [43,59] and the Rpd3S histone deacetylase [60]. The S. cerevisiae Nrd1 protein involved in the early termination pathway used for snRNA and snoRNA genes and cryptic unstable transcripts (CUTs) also recognizes Ser5P [61,62] (Figure 4). The increase in Ser2P during transcription that eventually leads to this mark predominating at the 30 end of genes occurs in parallel with the recruitment of histone modification, elongation, splicing, transport, 30 end RNAprocessing factors and transcription termination factors. For example, the splicing factors Prp40 and U2AF65 recognize the Ser2P–Ser5P double mark [3,55,63]. U2AF65 then recruits Prp19 to activate splicing [63]. Splicing, in turn, helps recruit factors that are important for transport of the RNA across the nuclear membrane [64], and it has recently been shown that the yeast export factor, Yra1, is recruited to transcription complexes through direct interaction with Ser2P–Ser5P CTD repeats [10]. The H3K36 methyltransferase, Set2, is also recruited by the Ser2P– Ser5P double mark to catalyze methylation of histone H3K36 [65]. This methylation mark leads to activation of the Rpd3S histone deacetylase complex, which prevents cryptic transcription initiation within open reading frames in yeast [60,66]. The RNA binding factor, Ssd1, the mitotic kinase, Hrr25, and the RecQ5 genome stability helicase also recognize Ser2P–Ser5P CTD repeats [9,55]. However, the precise roles that Ssd1 and Hrr25 play during the transcription cycle are unclear. The elongation-associated proteins, Npl3 and Spt6, as well as Rtt103, a termination factor that contributes to the ‘torpedo’-mediated termination mechanism in yeast, recognize repeats with Ser2P alone (Figure 4). The RNAprocessing factor Pcf11 selectively binds to CTD repeats with Ser2P and all-trans prolines. Both Pcf11 and Rtt103 bind cooperatively to neighboring Ser2P repeats [67], ensuring that binding only occurs efficiently when many repeats have Ser2P, which takes place towards the end of the transcription unit.

Review

(a)

Trends in Genetics July 2012, Vol. 28, No. 7

(b)

Protein-coding genes

snRNA genes

sn/snoRNA genes

(mammals)

(yeast)

(yeast and mammals)

Start

Start Initiation

YSPTSPS Histone modification, RNA 5′ end capping

P

YSPTSPSYS

Transcription cycle

P

Histone modification, splicing, elongation

YSPTSPS

Mediator (1MDa) TBP Capping enzyme (Cgt1, Mce1, Hce1)

Initiation

TBP

TBP

YSPTSPS

YSPTSPS Histone modification, RNA 5′ end capping

Scp1, Rtr1 Rpd3s

P

YSPTSPSYS

Capping enzyme (Hce1)

P

YSPTSPSYS

P

P

P

YSPTSPS

YSPTSPS

Pin1/Ess1, Scp1, Prp40, CA150, RecQ5, U2AF65, Set2, Ssd1, Hrr25, Yra1

RPAP2 Integrator (Int4)

P

Spt6, Npl3

P

RPAP2

Integrator

YSPTSPS

YSPTSPS Ssu72

P

YSPTSPS c P

Pcf11

YSPTSPS t t

Integrator

RNA 3′ end processing and termination

Int11

Ssu72

RPAP2 P

P P

YSPTSPSYSPTSPS Int11

P

Nrd1

YSPTSPS

P

RNA 3′ end processing and termination

Capping enzyme (Cgt1)

Rtt103

P

Pcf11

YSPTSPS t t

YSPTSPS End

YSPTSPS c

End TRENDS in Genetics

Figure 4. The role of carboxyl-terminal domain (CTD) modification in recruitment of factors to protein-coding and non-coding genes. The effect of modification of residues in the heptapeptide on protein binding has now been determined for several factors shown to interact with the CTD. Modification of the heptapeptide is shown as in Figure 1. CTD-interacting proteins are shown as light-blue blobs, with their function during the transcription cycle noted at the left. (a) Factors binding specifically to the differentially-modified CTD at different stages of the transcription cycle of genes encoding polyadenylated mRNA. Ssu72 probably functions late during the transcription cycle. In support of this, the polyadenylation factor Symplekin stimulates Ssu72 Ser5P dephosphorylation activity, which enhances transcription-coupled polyadenylation in vitro [53]. (b) Factors binding specifically to the differentially modified CTD at different stages of the transcription cycle of non-coding mammalian short nuclear (sn)RNA genes (left) and non-coding yeast short nucleolar (sno)RNA genes. Int11 is probably complexed with Int9 [73]. The latest model for RNA 30 end formation and/or termination of transcription of yeast snRNA and/or snoRNA genes predicts that Ssu72-dependent removal of Ser5P promotes Ndr1 release and Pcf11 recruitment [51,74]. It has been assumed that TBP and the capping enzyme interact with the CTD in the same way in the expression of protein-coding and non-coding genes. The CTD has the ability to adopt numerous conformations that can interact with a range of different CTD-interacting domains. These include the WW domain (Pin1, Ess1 and Prp40), the FF domain (CA150 and Prp40), the SRI domain (Set2 and RecQ5) [9], an unusual tandem SH2 domain (Spt6) [75–78] and the CID domain of Pcf11, Rtt103 and Nrd1 [61,67]. The factors whose interaction with the CTD has been well studied appear to recognize the modified heptapeptide (or heptapeptide pair) through induced fit. See [3] for further details. The splicing regulator SRp20 also requires the CTD to mediate its function, but the binding does not appear to be direct [79,80].

Thr4P facilitates recruitment of 30 processing factors to replication-activated histone genes in chicken [39]. However, it is not yet clear whether recruitment involves direct interaction of this mark with processing factors that are specific for these genes. Ser7P is required for correct transcription of human snRNA genes and 30 processing of the transcripts [36]. Ser2 kinases are needed only for snRNA 30 end processing and not for elongation of transcription. In addition, the transcripts from these genes are neither spliced nor polyadenylated, and RNA 30 processing is directed instead by a 30 box [68]. The molecular mechanisms by which the Ser7P mark influence expression of these genes and processing of the transcripts are becoming clearer. The RPAP2 phosphatase is recruited to snRNA genes through Ser7P, close to the promoter region where Ser7P is most abundant [14]. The interaction between RPAP2 and the CTD is direct and is abrogated by Ser7 mutation to Ala [45]. RPAP2 recruited

to snRNA genes is closely associated with a subcomplex of the snRNA gene-specific RNA 30 processing Integrator complex missing the Int11 catalytic subunit. The CTD phosphorylation pattern on these genes [14] suggests that, as transcription progresses, RPAP2 dephosphorylates Ser5 of the CTD and Ser2 is phosphorylated by P-TEFb, creating a ‘double mark’ composed of Ser7P on one repeat and Ser2P on the following repeat, which is specifically recognized by Int11 [69]. Therefore, this combination can act as a signal for recruitment of Int11 at the end of the RNA-encoding region. Thus, the RNA 30 end processing defect that occurs when Ser7 is mutated to alanine [36] is explained by the lack of Integrator recruitment, and the associated transcription defect may be because of the inability of Ser5P to be dephosphorylated [43]. Interestingly, RPAP2 appears to be recruited to protein-coding genes through a distinct mechanism, independent of Ser7P [14]. Two proteins, RPRD1A and RPRD1B, which interact 339

Review Box 2. Outstanding questions for future research  Which modifications can coexist (or are mutually exclusive) on the same CTD repeat?  Are differences in CTD phosphorylation patterns along genes a cause or consequence of differences in expression of the genes?  How is the uniform phosphorylation pattern on yeast proteincoding genes achieved?  Are marks different on different parts of the CTD; that is, can individual repeats or pairs of repeats of the same CTD be differentially modified?  How many proteins can bind the CTD simultaneously and in which order?  Does a factor always bind to the same place on the CTD?  Does a specific RNA-processing factor bind to methyl-R1810 of non-consensus repeat 31 of the mammalian CTD?  What is the role of the other non-consensus repeats? Are any other residues modified; for example, are the Lys residues acetylated, methylated, ubiquitylated, and so on?  Do the non-consensus repeats help restrict the binding of some proteins to the proximal part of the CTD? Do they recruit specific factors in addition to TDRD3?  Which transcription and/or RNA-processing factors are affected by phosphorylation of Thr4?  What is the effect of phosphorylation of Thr4 and Ser7 phosphorylation and/or Arg methylation on the CTD-binding factors identified so far?  Does Ser7 or its phosphorylation play any role in expression of protein-coding genes?  Do any other marks, or combination of marks, have gene-specific roles?

both with RPAP2 and the CTD, could help recruit the Ser5 phosphatase to protein-coding genes in the absence of Ser7P [56]. Concluding remarks Over the past 3 years, a wealth of new information about pol II CTD modification and CTD-interacting factors has been generated. Some of the new findings have answered questions we posed in the last review [3]. For example, Ser7 kinases have been identified (Table 1) by making known kinases sensitive to specific inhibition [17,26]. In addition, antibodies raised against putative modifications have identified methylation of R1810 and Thr4P, whereas CTD mutational analysis has uncovered gene-specific roles for these modifications. However, some of the new findings prompted further important questions about the CTD code and how it is translated into biological functions. Accordingly, we have updated the list of important outstanding questions (Box 2). We predict that, over the next few years, important functions will be attributed to modifications that have already been described, such as phosphorylation of Tyr and glycosylation of serines and that novel CTD modifications with important functions will be identified. Acknowledgments S.E. was supported by a grant from the Centre National de la Recherche Scientifique, and S.M. and M.D. were supported by grants from the Wellcome Trust and EPA Trust. We apologize for not citing many relevant publications owing to space constraints.

References 1 Chapman, R.D. et al. (2008) Molecular evolution of the RNA polymerase II CTD. Trends Genet. 24, 289–296

340

Trends in Genetics July 2012, Vol. 28, No. 7

2 Liu, P. et al. (2010) Genetic organization, length conservation, and evolution of RNA polymerase II carboxyl-terminal domain. Mol. Biol. Evol. 27, 2628–2641 3 Egloff, S. and Murphy, S. (2008) Cracking the RNA polymerase II CTD code. Trends Genet. 24, 280–288 4 Buratowski, S. (2003) The CTD code. Nat. Struct. Biol. 10, 679–680 5 Corden, J.L. (2007) Transcription. Seven ups the code. Science 318, 1735–1736 6 West, M.L. and Corden, J.L. (1995) Construction and analysis of yeast RNA polymerase II CTD deletion and substitution mutations. Genetics 140, 1223–1233 7 Bartolomei, M.S. et al. (1988) Genetic analysis of the repetitive carboxyl-terminal domain of the largest subunit of mouse RNA polymerase II. Mol. Cell. Biol. 8, 330–339 8 Buratowski, S. (2009) Progression through the RNA polymerase II CTD cycle. Mol. Cell 36, 541–546 9 Kanagaraj, R. et al. (2010) RECQ5 helicase associates with the Cterminal repeat domain of RNA polymerase II during productive elongation phase of transcription. Nucleic Acids Res. 38, 8131–8140 10 MacKellar, A.L. and Greenleaf, A.L. (2011) Cotranscriptional association of mRNA export factor Yra1 with C-terminal domain of RNA polymerase II. J. Biol. Chem. 286, 36385–36395 11 Li, X. and Manley, J.L. (2005) Inactivation of the SR protein splicing factor ASF/SF2 results in genomic instability. Cell 122, 365–378 12 Sims, R.J., 3rd et al. (2011) The C-terminal domain of RNA polymerase II is modified by site-specific methylation. Science 332, 99–103 13 Chapman, R.D. et al. (2007) Transcribing RNA polymerase II is phosphorylated at CTD residue serine-7. Science 318, 1780–1782 14 Egloff, S. et al. (2009) Chromatin structure is implicated in ‘late’ elongation checkpoints on the U2 snRNA and beta-actin genes. Mol. Cell. Biol. 29, 4002–4013 15 Gomes, N.P. et al. (2006) Gene-specific requirement for P-TEFb activity and RNA polymerase II phosphorylation within the p53 transcriptional program. Genes Dev. 20, 601–612 16 Cheng, C. and Sharp, P.A. (2003) RNA polymerase II accumulation in the promoter-proximal region of the dihydrofolate reductase and gamma-actin genes. Mol. Cell. Biol. 23, 1961–1967 17 Glover-Cutter, K. et al. (2009) TFIIH-associated Cdk7 kinase functions in phosphorylation of C-terminal domain Ser7 residues, promoter-proximal pausing, and termination by RNA polymerase II. Mol. Cell. Biol. 29, 5455–5464 18 Bataille, A.R. et al. (2012) A universal RNA polymerase II CTD cycle is orchestrated by complex interplays between kinase, phosphatase, and isomerase enzymes along genes. Mol. Cell 45, 158–170 19 Kim, H. et al. (2010) Gene-specific RNA polymerase II phosphorylation and the CTD code. Nat. Struct. Mol. Biol. 17, 1279–1286 20 Mayer, A. et al. (2010) Uniform transitions of the general RNA polymerase II transcription complex. Nat. Struct. Mol. Biol. 17, 1272–1278 21 Zhang, D.W. et al. (2012) Ssu72 phosphatase dependent erasure of phospho-Ser7 marks on the RNA Polymerase II C-terminal domain is essential for viability and transcription termination. J. Biol. Chem. 287, 8541–85451 22 Kanin, E.I. et al. (2007) Chemical inhibition of the TFIIH-associated kinase Cdk7/Kin28 does not impair global mRNA synthesis. Proc. Natl. Acad. Sci. U.S.A. 104, 5812–5817 23 Kim, M. et al. (2009) Phosphorylation of the yeast Rpb1 C-terminal domain at serines 2, 5, and 7. J. Biol. Chem. 284, 26421–26426 24 Viladevall, L. et al. (2009) TFIIH and P-TEFb coordinate transcription with capping enzyme recruitment at specific genes in fission yeast. Mol. Cell 33, 738–751 25 Hong, S.W. et al. (2009) Phosphorylation of the RNA polymerase II Cterminal domain by TFIIH kinase is not essential for transcription of Saccharomyces cerevisiae genome. Proc. Natl. Acad. Sci. U.S.A. 106, 14276–14280 26 Akhtar, M.S. et al. (2009) TFIIH kinase places bivalent marks on the carboxy-terminal domain of RNA polymerase II. Mol. Cell 34, 387–393 27 Galbraith, M.D. et al. (2010) CDK8: A positive regulator of transcription. Transcription 1, 4–12

Review 28 Qiu, H. et al. (2009) Phosphorylation of the Pol II CTD by KIN28 enhances BUR1/BUR2 recruitment and Ser2 CTD phosphorylation near promoters. Mol. Cell 33, 752–762 29 Zhou, K. et al. (2009) Control of transcriptional elongation and cotranscriptional histone modification by the yeast BUR kinase substrate Spt5. Proc. Natl. Acad. Sci. U.S.A. 106, 6956–6961 30 Liu, Y. et al. (2009) Phosphorylation of the transcription elongation factor Spt5 by yeast Bur1 kinase stimulates recruitment of the PAF complex. Mol. Cell. Biol. 29, 4852–4863 31 Tietjen, J.R. et al. (2010) Chemical-genomic dissection of the CTD code. Nat. Struct. Mol. Biol. 17, 1154–1161 32 Bres, V. et al. (2008) The multi-tasking P-TEFb complex. Curr. Opin. Cell Biol. 20, 334–340 33 Blazek, D. et al. (2011) The Cyclin K/Cdk12 complex maintains genomic stability via regulation of expression of DNA damage response genes. Genes Dev. 25, 2158–2172 34 Bartkowiak, B. et al. (2010) CDK12 is a transcription elongationassociated CTD kinase, the metazoan ortholog of yeast Ctk1. Genes Dev. 24, 2303–2316 35 Bartkowiak, B. and Greenleaf, A.L. (2012) Phosphorylation of RNAPII: to P-TEFb or not to P-TEFb? Transcription 2, 115–119 36 Egloff, S. et al. (2007) Serine-7 of the RNA polymerase II CTD is specifically required for snRNA gene expression. Science 318, 1777– 1779 37 Schwer, B. and Shuman, S. (2011) Deciphering the RNA polymerase II CTD code in fission yeast. Mol. Cell 43, 311–318 38 Boeing, S. et al. (2010) RNA polymerase II C-terminal heptarepeat domain Ser-7 phosphorylation is established in a mediator-dependent fashion. J. Biol. Chem. 285, 188–196 39 Hsin, J.P. et al. (2011) RNAP II CTD phosphorylated on threonine-4 is required for histone mRNA 30 end processing. Science 334, 683–686 40 Pirngruber, J. et al. (2009) CDK9 directs H2B monoubiquitination and controls replication-dependent histone mRNA 30 -end processing. EMBO Rep. 10, 894–900 41 Yeo, M. et al. (2005) Small CTD phosphatases function in silencing neuronal gene expression. Science 307, 596–600 42 Meinhart, A. et al. (2005) A structural perspective of CTD function. Genes Dev. 19, 1401–1415 43 Mosley, A.L. et al. (2009) Rtr1 is a CTD phosphatase that regulates RNA polymerase II during the transition from serine 5 to serine 2 phosphorylation. Mol. Cell 34, 168–178 44 Jeronimo, C. et al. (2007) Systematic analysis of the protein interaction network for the human transcription machinery reveals the identity of the 7SK capping enzyme. Mol. Cell 27, 262–274 45 Egloff, S. et al. (2012) Ser7 phosphorylation of the CTD recruits the RPAP2 Ser5 phosphatase to snRNA genes. Mol. Cell 45, 111–122 46 Clemente-Blanco, A. et al. (2011) Cdc14 phosphatase promotes segregation of telomeres through repression of RNA polymerase II transcription. Nat. Cell Biol. 13, 1450–1456 47 Coudreuse, D. et al. (2010) A gene-specific requirement of RNA polymerase II CTD phosphorylation for sexual differentiation in S. pombe. Curr. Biol. 20, 1053–1064 48 Nechaev, S. and Adelman, K. (2011) Pol II waiting in the starting gates: regulating the transition from transcription initiation into productive elongation. Biochim. Biophys. Acta 1809, 34–45 49 Brookes, E. et al. (2012) Polycomb associates genome-wide with a specific RNA polymerase II variant, and regulates metabolic genes in ESCs. Cell Stem Cell 10, 157–170 50 Noble, C.G. et al. (2005) Key features of the interaction between Pcf11 CID and RNA polymerase II CTD. Nat. Struct. Mol. Biol. 12, 144–151 51 Werner-Allen, J.W. et al. (2011) cis-Proline-mediated Ser(P)5 dephosphorylation by the RNA polymerase II C-terminal domain phosphatase Ssu72. J. Biol. Chem. 286, 5717–5726 52 Singh, N. et al. (2009) The Ess1 prolyl isomerase is required for transcription termination of small noncoding RNAs via the Nrd1 pathway. Mol. Cell 36, 255–266 53 Xiang, K. et al. (2010) Crystal structure of the human symplekinSsu72-CTD phosphopeptide complex. Nature 467, 729–733 54 Xu, Y.X. et al. (2003) Pin1 modulates the structure and function of human RNA polymerase II. Genes Dev. 17, 2765–2776

Trends in Genetics July 2012, Vol. 28, No. 7

55 Phatnani, H.P. et al. (2004) Expanding the functional repertoire of CTD kinase I and RNA polymerase II: novel phosphoCTD-associating proteins in the yeast proteome. Biochemistry 43, 15702–15719 56 Ni, Z. et al. (2011) Control of the RNA polymerase II phosphorylation state in promoter regions by CTD interaction domain-containing proteins RPRD1A and RPRD1B. Transcription 2, 237–242 57 Ghosh, A. et al. (2011) Structural insights to how mammalian capping enzyme reads the CTD code. Mol. Cell 43, 299–310 58 Krogan, N.J. et al. (2003) The Paf1 complex is required for histone H3 methylation by COMPASS and Dot1p: linking transcriptional elongation to histone methylation. Mol. Cell 11, 721–729 59 Zhang, Y. et al. (2006) Determinants for dephosphorylation of the RNA polymerase II C-terminal domain by Scp1. Mol. Cell 24, 759–770 60 Govind, C.K. et al. (2010) Phosphorylated Pol II CTD recruits multiple HDACs, including Rpd3C(S), for methylation-dependent deacetylation of ORF nucleosomes. Mol. Cell 39, 234–246 61 Vasiljeva, L. et al. (2008) The Nrd1-Nab3-Sen1 termination complex interacts with the Ser5-phosphorylated RNA polymerase II Cterminal domain. Nat. Struct. Mol. Biol. 15, 795–804 62 Gudipati, R.K. et al. (2008) Phosphorylation of the RNA polymerase II C-terminal domain dictates transcription termination choice. Nat. Struct. Mol. Biol. 15, 786–794 63 David, C.J. et al. (2011) The RNA polymerase II C-terminal domain promotes splicing activation through recruitment of a U2AF65-Prp19 complex. Genes Dev. 25, 972–983 64 Bono, F. and Gehring, N.H. (2011) Assembly, disassembly and recycling: the dynamics of exon junction complexes. RNA Biol. 8, 24–29 65 Li, M. et al. (2005) Solution structure of the Set2-Rpb1 interacting domain of human Set2 and its interaction with the hyperphosphorylated C-terminal domain of Rpb1. Proc. Natl. Acad. Sci. U.S.A. 102, 17636–17641 66 Drouin, S. et al. (2010) DSIF and RNA polymerase II CTD phosphorylation coordinate the recruitment of Rpd3S to actively transcribed genes. PLoS Genet. 6, e1001173 67 Lunde, B.M. et al. (2010) Cooperative interaction of transcription termination factors with the RNA polymerase II C-terminal domain. Nat. Struct. Mol. Biol. 17, 1195–1201 68 Egloff, S. et al. (2008) Expression of human snRNA genes from beginning to end. Biochem. Soc. Trans. 36, 590–594 69 Egloff, S. et al. (2010) The integrator complex recognizes a new double mark on the RNA polymerase II carboxyl-terminal domain. J. Biol. Chem. 285, 20564–20569 70 Kelly, W.G. et al. (1993) RNA polymerase II is a glycoprotein. Modification of the COOH-terminal domain by O-GlcNAc. J. Biol. Chem. 268, 10416–10424 71 Comer, F.I. and Hart, G.W. (2001) Reciprocity between O-GlcNAc and O-phosphate on the carboxyl terminal domain of RNA polymerase II. Biochemistry 40, 7845–7852 72 Stiller, J.W. and Cook, M.S. (2004) Functional unit of the RNA polymerase II C-terminal domain lies within heptapeptide pairs. Eukaryot. Cell 3, 735–740 73 Baillat, D. et al. (2005) Integrator, a multiprotein mediator of small nuclear RNA processing, associates with the C-terminal repeat of RNA polymerase II. Cell 123, 265–276 74 Kim, M. et al. (2006) Distinct pathways for snoRNA and mRNA termination. Mol. Cell 24, 723–734 75 Sun, M. et al. (2010) A tandem SH2 domain in transcription elongation factor Spt6 binds the phosphorylated RNA polymerase II C-terminal repeat domain (CTD). J. Biol. Chem. 285, 41597–41603 76 Diebold, M.L. et al. (2010) Noncanonical tandem SH2 enables interaction of elongation factor Spt6 with RNA polymerase II. J. Biol. Chem. 285, 38389–38398 77 Close, D. et al. (2011) Crystal structures of the S. cerevisiae Spt6 core and C-terminal tandem SH2 domain. J. Mol. Biol. 408, 697–713 78 Liu, J. et al. (2011) Solution structure of tandem SH2 domains from Spt6 protein and their binding to the phosphorylated RNA polymerase II C-terminal domain. J. Biol. Chem. 286, 29218–29226 79 Munoz, M.J. et al. (2010) The carboxy terminal domain of RNA polymerase II and alternative splicing. Trends Biochem. Sci. 35, 497–504

341