Journal Pre-proof The features and regulation of co-transcriptional splicing in Arabidopsis Danling Zhu, Fei Mao, Yuanchun Tian, Xiaoya Lin, Lianfeng Gu, Hongya Gu, Li-jia Qu, Yufeng Wu, Zhe Wu
PII: DOI: Reference:
S1674-2052(19)30368-5 https://doi.org/10.1016/j.molp.2019.11.004 MOLP 851
To appear in: MOLECULAR PLANT Accepted Date: 15 November 2019
Please cite this article as: Zhu D., Mao F., Tian Y., Lin X., Gu L., Gu H., Qu L.-j., Wu Y., and Wu Z. (2019). The features and regulation of co-transcriptional splicing in Arabidopsis. Mol. Plant. doi: https:// doi.org/10.1016/j.molp.2019.11.004. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. All studies published in MOLECULAR PLANT are embargoed until 3PM ET of the day they are published as corrected proofs on-line. Studies cannot be publicized as accepted manuscripts or uncorrected proofs. © 2019 The Author
1
Title: The features and regulation of co-transcriptional splicing in Arabidopsis
2
Authors: Danling Zhu1,†, Fei Mao2,†, Yuanchun Tian2, Xiaoya Lin3,5, Lianfeng Gu4, Hongya
3
Gu3, Li-jia Qu3, Yufeng Wu2,* and Zhe Wu1,*
4
1
5
Science and Technology, Shenzhen, China, 518055.
:
2
7
Center, Nanjing Agriculture University, Nanjing, Jiangsu 210095, China.
8
3
9
Sciences, College of Life Sciences, Peking University, Beijing, China, 100871.
Institute of Plant and Food Research, Department of Biology, Southern University of
State Key Laboratory for Crop Genetics and Germplasm Enhancement, Bioinformatics
State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life
10
4
11
Forestry University, Fuzhou, China, 350002.
12
5
13
510006.
14
†These authors contributed equally.
15
*Co-corresponding authors:
[email protected] and
[email protected]
Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and
Current address: School of Life Sciences, Guangzhou University, Guangzhou, China.
1:
17
Running title
18
Short Summary:
CB-RNA-seq and eCLIP-seq reveal features of CTS in plants.
1
19
Splicing and transcription are often coupled as demonstrated in various oganisms. However,
20
plant genes are much shorter than mammalian genes, questioning how frequent plant genes
21
can be spliced during transcription. Through developing chromatin-bound RNA-seq and
22
eCLIP-seq, we studied the features and regulation of co-transcriptional splicing, involving
23
direct RNA-binding of an hnRNP like protein, RZ-1C.
24 25
ABSTRACT
2:
Precursor mRNA (pre-mRNA) splicing is essential for gene expression in most eukaryotic
27
organisms. Previous studies from mammals, Drosophila and yeast show that the majority of
28
splicing
29
co-transcriptional splicing (CTS) and its regulation are still largely unknown. Here, we used
30
chromatin-bound RNA sequencing (CB-RNA-seq), to study CTS in Arabidopsis thaliana.
31
CTS was widespread in Arabidopsis seedlings, with a large proportion of alternative splicing
32
events determined co-transcriptionally. CTS efficiency correlated with gene expression level,
33
the chromatin landscape and, most surprisingly, the number of introns and exons of
34
individual genes, but was independent of gene length. In combination with eCLIP-seq
35
analysis, our results showed that the hnRNP-like proteins RZ-1B and RZ-1C promote
3:
efficient CTS globally through direct binding, frequently to exonic sequences. Notably, this
37
general effect of RZ-1B/1C on splicing promotion was mainly at the chromatin level, not at
38
the mRNA level. RZ-1C promotes CTS of multiple exons genes in association with its
39
binding to the regions both proximal and distal to the regulated introns. We propose that
40
RZ-1C promotes efficient CTS of genes with multiple exons involving cooperative
41
interactions with many exons, introns and splicing factors. Our work thus reveals important
events
occurs
co-transcriptionally.
2
In
plants,
however,
the
nature
of
42
features of CTS in plants and provides methodologies for the investigation of CTS and
43
RNA-binding proteins in plants.
44 45
Introduction:
4:
Pre-mRNA splicing is carried out by the spliceosome, a megadalton complex that triggers the
47
precise excision of introns from the pre-mRNA (Jangi and Sharp, 2014; Reddy et al., 2013).
48
Early studies of individual genes implied that splicing could already occur during
49
transcription, a process termed CTS (Bauren and Wieslander, 1994; Beyer and Osheim, 1988;
50
Osheim et al., 1985; Zhang et al., 1994). The potential for CTS is not surprising in organisms
51
with relatively long genes (e.g. Human). Since the time needed for splicing (a few seconds to
52
3 min) is much shorter than the time needed for transcribing a gene (often more than 10mins
53
for human genes)(Alexander et al., 2010; Bentley, 2014; Beyer and Osheim, 1988; Huranova
54
et al., 2010; Singh and Padgett, 2009). However, recent genome-wide profiling of nascent
55
RNA has revealed that CTS is the predominant mode of splicing in different species
5:
including budding yeast (Ameur et al., 2011; Bhatt et al., 2012; Khodor et al., 2011; Khodor
57
et al., 2012; Tilgner et al., 2012; Oesterreich et al., 2016), an organism with average gene
58
length of 1-2kb, indicating rapid CTS. Indeed, in budding yeast, the catalysis of splicing
59
occurs on average 45bp after the Pol II finish transcribing the 3’ intron acceptor site, much
:0
faster than previously anticipated (Oesterreich et al., 2016). Interestingly, in contrast, two
:1
recent reports showed the catalysis of splicing in human cell often occurs when Pol II travels
:2
to more than a kilobases downstream of intron acceptor site (Drexler et al., 2019; Nojima et
:3
al., 2018), indicating significantly differed regulations between species.
:4 3
:5
Despite their tight coupling in vivo, transcription and splicing can both happen independent
::
of each other in vitro. How and why these two processes are so tightly coupled in-vivo has
:7
been the focus of much effort in yeast and mammalian cells. The extent of CTS correlates
:8
with different gene features and varies between organisms, tissues and individual genes,
:9
indicating great regulatory potentials (Brugiolo et al., 2013; Khodor et al., 2012). In particular,
70
the transcription elongation rate, which is regulated by trans factors and chromatin structure,
71
can influence splicing site choice (Bentley, 2014; de la Mata et al., 2003; Kornblihtt, 2007;
72
Naftelberg et al., 2015; Roberts et al., 1998). Faster elongation rate could reduce the chance
73
of weak, promoter proximal splicing sites being utilized, influencing the production of certain
74
mRNA isoforms (Bentley, 2014; de la Mata et al., 2003; Kornblihtt, 2007; Naftelberg et al.,
75
2015; Roberts et al., 1998). This is known as “kinetic coupling”, a mechanism has been
7:
demonstrated in several different organisms (Bentley, 2014; de la Mata et al., 2003;
77
Kornblihtt, 2007; Naftelberg et al., 2015; Roberts et al., 1998), and is involved in flowering
78
time control (Marquardt et al., 2014) and light response (Godoy Herz et al., 2019) in
79
Arabidopsis. In addition, a study in budding yeast showed that slower or faster elongation
80
could also affect the fidelity of splicing in additional to its efficiency, indicating that an
81
optimal elongation rate has likely been selected for in-vivo (Aslanzadeh et al., 2018). Notably,
82
recent works in mammalian cells showed the spliceosome is complexed with Pol II CTD
83
phosphorylated on the serine 5 (S5P) but not Pol II CTD phosphorylated on the serine 2
84
(S2P), suggesting a physical link between splicing and transcription through phosphorylation
85
of Pol II CTD (Nojima et al., 2015, Nojima et al., 2018). In addition, splicing and
8:
transcription can also be coupled by splicing factors that in turn regulate transcription (Das et
87
al., 2007; de la Mata and Kornblihtt, 2006; Ji et al., 2013; Monsalve et al., 2000). The
88
coupling between splicing and transcription is likely also important for RNA quality control.
89
For example, the deposition of exon-junction complex, a complex important for none sense 4
90
mediated decay of RNA, is largely splicing dependent and co-transcriptional (Alexandrov et
91
al., 2012; Barbosa et al., 2012; Singh et al., 2012; Steckelberg et al., 2012).
92 93
Despite its importance, relatively little is known about the coupling between transcription and
94
splicing in plants. In Arabidopsis, CTS has been observed at individual loci such as FLC and
95
DOG1 (Dolata et al., 2015; Rosa et al., 2016; Wu et al., 2016a). For example, single molecule
9:
RNA FISH (smFISH) results show only two dots for unspliced FLC RNA within the nucleus,
97
and these overlap with the FLC DNA FISH signal, indicating efficient co-transcriptional
98
splicing (Rosa et al., 2016). In addition, pNET-seq analysis in Arabidopsis also found
99
enrichment of reads with 3’end mapped at exon 3’ end (most likely splicing intermediate)
100
specifically for the Pol II CTD S5P (Zhu et al., 2018), consistent with the idea of CTS.
101
Notably, typical plant (e.g. Arabidopsis thaliana) genes are structurally similar to yeast genes
102
with an average length of 2.4 kb (Reddy, 2007), much shorter than those of mammals, which
103
should theoretically reduce the chance for CTS. Moreover, the majority of alternative splicing
104
events in plants are intron retention, in sharp contrast to the situation in mammals (Reddy,
105
2007). To date, the extent of CTS as well as its features and regulations at whole
10:
genome-level is unknown in plants.
107 108
RNA-binding proteins represent major regulators in determine the fate of RNAs including
109
splicing, either co- or post-transcriptionally. Arabidopsis genome encode at least 500 putative
110
RNA-binding proteins, the majority of which are functionally unknown (Marondedze et al.,
111
2016; Reichel et al., 2016). Indeed, methodology for study RNA-binding proteins such as
112
CLIP-seq was only available recently and remains technically challenging for plant samples
113
(Meyer et al., 2017; Zhang et al., 2015). A relatively well studied class of RNA-binding 5
114
proteins in plant are SR proteins, a group of proteins that were extensively linked with
115
alternative splicing (Reddy and Shad Ali, 2011), presumably through their interactions with
11:
sequence motifs that located in exons or introns (Jeong, 2017). Mammalian SR proteins also
117
regulate Pol II 5’ end pausing and elongation (Ji et al., 2013; Lin et al., 2008; Xiao et al.,
118
2019), suggesting their important roles at co-transcriptional level. We reported previously
119
that a small group of proteins are in close association with SR proteins in Arabidopsis,
120
namely RZ-1B and RZ-1C (Wu et al., 2016b). Both proteins feature a N-terminal RRM
121
domain, a C-terminal low complexity region and a zinc-finger motif in the middle (Lorkovic
122
and Barta, 2002). RZ-1s represent hnRNPs (Hanano et al., 1996; Lorkovic and Barta, 2002)
123
and have potential to interact with multiple SR proteins (Wu et al., 2016b). RZ-1b and RZ-1c
124
have redundant and essential function in plant development and regulate alternative splicing
125
of more than 100 genes (Wu et al., 2016b). Notably, RZ-1C tightly associates with chromatin
12:
and could affect transcription in addition to splicing at individual genes such as FLC,
127
suggesting a possible role in co-transcriptional regulation (Wu et al., 2016b).
128 129
Here we studied the features and regulation of CTS in plants through adapting
130
chromatin-bound RNA-seq in Arabidopsis. CTS is widespread in Arabidopsis seedlings. The
131
efficiency of CTS correlate with gene expression level, the distance from intron 3’end to gene
132
3’end and also histone modifications, consistent with previous reports in yeast and animals,
133
and support the idea of kinetic coupling. As a novel feature of CTS in plant, we found the
134
average CTS efficiency of genes correlated with their intron and or exon numbers while
135
largely independent of gene length. We further explored roles of RZ-1B and RZ-1C in CTS
13:
regulation. RZ-1B and RZ-1C function to enhance CTS globally. The CTS promotion by
137
RZ-1C is in association with its direct binding to RNA targets, as revealed through
138
eCLIP-seq, an efficient and powerful technique we adapted for studying the direct :
139
RNA-binding of proteins in plants. Our work uncovers general features of CTS in plants and
140
highlights the role exon/intron numbers as well as RZ-1B/1C in CTS regulation.
141 142
Results
143
Widespread CTS as revealed by CB-RNA-seq in Arabidopsis seedlings.
144
To test the universality of CTS in plants, we adapted the CB-RNA-seq method for use in
145
Arabidopsis. CB-RNA was extracted from isolated nuclei followed by a stringent urea wash
14:
to release non-chromatin associated protein and RNAs, a method that has been widely used in
147
both mammalian and yeast systems (Bhatt et al., 2012; Khodor et al., 2011; Oesterreich et al.,
148
2016; Wuarin and Schibler, 1994) (Fig.1A; Supplementary Fig. 1A and B). We used the ratio
149
of unspliced RNA to spliced RNA at several different genes as an estimate of enrichment of
150
nascent RNA in our assay. This include FLC, a gene that showed tight co-transcriptional
151
splicing (Rosa et al., 2016). Given efficient removal of intron at FLC chromatin, one would
152
expect the highest enrichment of unspliced RNA at chromatin compared with nucleus and
153
cytosol. Indeed, this is the case at FLC and other two genes we tested, indicating the
154
successful enrichment of chromatin-bound RNA (Supplementary Fig. 1C to E). The
155
CB-RNA fraction was treated to remove the ribosomal RNA and any contaminating poly (A)
15:
RNA and then used for Illumina RNA-seq with strand specificity.
157 158
For a given locus, the CB-RNA reflects the sum of two proportions of RNA molecules (Fig.
159
1B) (Ietswaart et al., 2017; Wu et al., 2016a): RNAs being transcribing at the locus (RNAe,
1:0
elongating RNA) and RNAs that have been fully transcribed but not yet released from the
1:1
chromatin (RNAf, full-length RNA). The latter fraction exists as RNA polymerase II (Pol II) 7
1:2
generally pauses at the 3ʹ end of the gene before completion of termination(Zhu et al., 2018).
1:3
Assuming that RNAe dominates the CB-RNA fraction, we expect a 5ʹ to 3ʹ decrease in the
1:4
amount of CB-RNA (Fig. 1B). Furthermore, assuming that introns are removed from the
1:5
pre-mRNA co-transcriptionally, the reads coverage at the intronic region should be
1::
significantly lower than at the adjacent exonic regions, resulting in a sawtooth-like pattern
1:7
(Fig. 1C).
1:8 1:9
As expected, we observed a declining pattern from the 5ʹ to the 3ʹ end at many individual loci
170
in our CB-RNA-seq data, but not in the mRNA-seq data (Fig.1D and E; Supplementary Fig.
171
1F), indicating successful isolation of CB-RNA. We note that this declining pattern was
172
observed more often in long genes compared with short ones, possibly due to the longer time
173
needed by Pol II for elongation at these genes, resulting in the RNAe dominating the locus
174
over the RNAf. Moreover, the sawtooth pattern of CB-RNA was observed at individual genes
175
as well as globally, indicating general CTS (Fig.1D to F; Supplementary Fig. 1F). Given
17:
good read coverage of introns as well as exons, and also good correlation between biological
177
repeats (Supplementary Fig. 1G, Supplementary Table 1), we then quantified the CTS
178
efficiency throughout the Arabidopsis genome using widely accepted standards (Herzel and
179
Neugebauer, 2015; Khodor et al., 2011; Khodor et al., 2012). At each intron–exon boundary,
180
we calculated the ratio of intronic reads over the adjacent exonic reads to yield a 5ʹ splicing
181
site (5ʹSS) ratio and a 3ʹ splicing site (3ʹSS) ratio (Fig. 1G). The median value for both 5ʹSS
182
(88,364 introns) and 3ʹSS (89,231 introns) ratios was around 0.2 (Fig. 1H), similar to that
183
found in Drosophila (Khodor et al., 2011), but lower than that in mouse (Khodor et al., 2012).
184
Therefore, we conclude that Arabidopsis pre-mRNAs are generally co-transcriptionally
185
spliced, as in yeast and animals.
18: 8
187
CTS efficiency correlated with gene expression level.
188
Accumulating evidence indicates that transcription and splicing are coupled, i.e., Pol II
189
transcription can help determine splice-site choice and efficient assembly of an active
190
spliceosome can change the extent of CTS (Godoy Herz et al., 2019; Naftelberg et al., 2015).
191
We therefore determined whether CTS efficiency was related to the level of gene expression.
192
For high-confidence analysis, we excluded data with a 5ʹSS or 3ʹSS ratio equal to or more
193
than one. In these cases, reads were often indistinguishable between an intron or alternative
194
exon in the same region. The rest of the data, i.e., 5ʹSS and 3ʹSS ratios distributed between 0
195
and 1, were plotted against the corresponding gene expression level (as determined by the
19:
mRNA-seq data). Both the 5ʹSS and 3ʹSS ratios negatively correlated with the level of gene
197
expression (Fig. 2A; Supplementary Fig. 2), suggesting that the pre-mRNAs of highly
198
expressed genes were processed at a higher rate than those of weakly expressed genes.
199 200
CTS efficiency correlated with exon and or intron numbers.
201
In mouse and Drosophila, gene structure influences splicing efficiency (Khodor et al., 2011;
202
Khodor et al., 2012). Therefore, we plotted the 5ʹSS and 3ʹSS ratios together with different
203
gene features. The first and last introns were often less efficiently removed at the
204
co-transcriptional level compared with the internal introns (Supplementary Fig. 3A).
205
Furthermore, the longer the distance between the intron 3ʹ end and the gene 3ʹ end, the more
20:
efficient the CTS (Supplementary Fig. 3B). This suggests sufficient co-transcriptional time is
207
beneficial to efficient CTS, consistent with previously reported in other organisms (Khodor et
208
al., 2011; Khodor et al., 2012). We observed that both gene length and number of
209
introns/exons negatively correlated with the average 5ʹSS and 3ʹSS ratios (Fig. 2B and C;
210
Supplementary Fig. 4A). Since intron number and gene length correlate with each other (Fig.
211
2D), we wondered which one was more relevant to the regulation of CTS efficiency. 9
212
Surprisingly, the negative correlation between intron number and 5ʹSS and 3ʹSS ratio
213
remained, even for gene groups with fixed lengths (Fig. 2E; Supplementary Fig. 4B). By
214
contrast, no obvious tendency was found between gene length and 5ʹSS or 3ʹSS ratio among
215
genes with a fixed number of introns (Fig. 2F; Supplementary Fig. 4C and D ). We also used
21:
a general linear model to statistically assess the effect of intron number or gene length on
217
CTS efficiency (see Methods). The ANOVA test indicated that intron number was a
218
significant factor (p < 2.2e-16), rather than gene length (p = 0.88), or the interaction between
219
intron number and gene length (p = 0.84). Furthermore, intron number did not correlate with
220
gene expression level (Supplementary Fig. 4E), excluding the involvement of intron number
221
in the regulation of CTS via effects on gene expression. Unlike the case in animals, there was
222
no obvious tendency between exon or intron length with the splicing efficiency of its own or
223
its adjacent exon (Fig. 2G and H). Overall, we conclude that the number of introns and or
224
exons was an important contributor to efficient CTS independent of the gene length.
225 22:
CTS efficiency negatively correlated with certain histone marks.
227
Transcription and CTS both occur within the context of chromatin in vivo and therefore are
228
interlinked (Aslanzadeh et al., 2018; de Almeida and Carmo-Fonseca, 2014; Ullah et al.,
229
2018). We determined whether CTS efficiency is related to the level of certain histone marks
230
within the chromatin, i.e. H3K4me3, H3K9ac and H3K27me3, for which data from the same
231
seedling stage are available in public databases (see Methods). There was clear correlated
232
trend between the 5ʹSS or 3ʹSS ratio and the levels of the H3K4me3- or H3K9ac-marked
233
histones, respectively (Fig. 3). But none correlated trend between CTS and the level of
234
H3K27me3-marked histones was found. As previously reported and in consistent with
235
“kinetic coupling” model, higher H3K4me3 or H3K9ac levels may indicate faster elongation
10
23:
(Church and Fleming, 2018 ; Nagai et al., 2017), which could in turn reduce the time Pol II
237
spends on the chromatin, therefore reducing the efficiency of CTS.
238 239
Decision of alternative splicing are often made co-transcriptionally.
240
Alternative splicing is important regulatory step in gene expression in higher eukaryotes
241
(Jangi and Sharp, 2014; Reddy et al., 2013; Staiger and Brown, 2013). Examples in
242
mammalian cells showed that splicing of an alternative exon can happen either co or
243
post-transcriptionally (Pandya-Jones and Black, 2009; Saldi et al., 2016; Vargas et al., 2011).
244
Furthermore, it was proposed that splicing commitment but not necessarily catalysis occurs at
245
the chromatin (de la Mata et al., 2010). We therefore wondered to what extent alternative
24:
splicing is determined co-transcriptionally in plant. We focused on exon-skipping (ES),
247
alternative 5ʹ splice site (A5SS) and alternative 3ʹ splice site (A3SS) events (Fig. 4A) which
248
were annotated in TAIR10 genome and can be assessed in our Col-0 mRNA-seq data (see
249
method). In these events, featured reads spanning an exon-exon junction were used to
250
determine the relative ratio of different isoforms from the same locus, reflected as a PSI value
251
(Percentage Spliced In value; see Methods). Unlike intron retention (IR), the abundance of
252
isoforms in these events can be calculated without the confounding effects of unprocessed
253
introns in the CB-RNA. Comparing the mRNA-seq and CB-RNA-seq data, we found that the
254
PSI values from the two samples were generally correlated (Fig. 4B), indicating the decision
255
of alternative splicing are often made at chromatin. It worth to note that decision made
25:
post-transcriptionally does exist at some individual loci (Fig. 4C). In addition, compared to
257
constitutive removed introns at the same locus, introns involved in alternatively splicing were
258
generally less efficiently removed co-transcriptionally, although they were often not the least
259
efficient (Fig. 4D). Therefore, our data indicated that for alternative splicing events including
11
2:0
ES, A5SS and A3SS in Arabidopsis seedlings, the decision of alternative splicing is often
2:1
made at chromatin.
2:2 2:3
RZ-1B and RZ-1C enhance the efficiency of CTS globally.
2:4
We next tested how CTS is regulated by trans-factors. Among many candidates, we choose to
2:5
test RZ-1B and RZ-1C as a starting point, given they associate with chromatin and interact
2::
with a group of SR proteins (see introduction, Wu et al., 2016b), the later are important
2:7
splicing regulators across different species. The effect of RZ-1B and RZ-1C on CTS was
2:8
tested through CB-RNA-seq. Surprisingly, the distribution of the 5ʹSS and 3ʹSS ratios of the
2:9
mutant shifted towards a higher value and the means of the ratios increased, suggesting that
270
the RZ-1 proteins positively regulate the CTS efficiency of a large number of genes (Fig. 5A;
271
Supplementary Fig. 5A and B). Indeed, we found that 6,119 exon-intron-exon units from
272
3,046 genes were significantly differentially spliced at co-transcriptional level in the rz-1b
273
rz-1c mutant compared with Col-0 (false discovery rate (FDR) < 0.001; see Methods,
274
Supplementary Table 2; Fig. 5B). Among them, 98% of units (5971 of 6119) were less
275
efficiently spliced in rz-1b rz-1c. This general effects of RZ-1 on CTS efficiency contrast to
27:
its relatively subtle effects on splicing based on mRNA-seq. The 5’SS or 3’SS ratio, of either
277
all introns or those that were differentially spliced at chromatin, remains unchanged between
278
Col-0 and rz-1b rz-1c, based on mRNA-seq data (Fig. 5C; Supplementary Fig. 5C). Indeed,
279
by using same criteria as in CB-RNA, we found only 502 exon-intron-exon units that were
280
less efficiently spliced in rz-1b rz-1c compared with Col-0 based on mRNA-seq data
281
(Supplementary Fig. 5D, yellow cycle). These results suggest although co-transcriptional
282
splicing rate is slower in rz-1b rz-1c, the introns were generally been removed effectively at
283
mature mRNA. Taken together, the above results indicate RZ-1B and RZ-1C are generally
284
required for the efficient CTS. 12
285 28:
RZ-1C promotes CTS in association with its direct RNA-binding
287
We then searched for RNA targets bound by the RZ-1 proteins in vivo to further assess how
288
these proteins regulate CTS. Crosslinking and immunoprecipitation (CLIP) analysis has been
289
shown to be the most reliable method in faithfully detecting RNA-protein interactions,
290
however it is extremely challenging in plants (Meyer et al., 2017; Zhang et al., 2015). We
291
combined the individual-nucleotide resolution CLIP (iCLIP) protocol (Konig et al., 2010)
292
with the recently developed enhanced CLIP (eCLIP) protocol (Van Nostrand et al., 2016) and
293
adapted it for our research (see Methods, Fig. 5D). We kept the step of radio isotope labeling
294
of RNA as in the original iCLIP protocol to ensures the faithful detection of RNA that bound
295
by the RZ-1C (Fig. 5E). We then followed the eCLIP protocol for generation of sequencing
29:
library due to its higher efficiency and reliability (Van Nostrand et al., 2016). RZ-1C fused
297
with GFP and expressed in its genomic context (RZ-1C:GFP-RZ-1C) was able to
298
complement the rz-1b rz-1c mutants (Wu et al., 2016b). This transgenic line and a 35S:GFP
299
construct (as negative control) were used in our eCLIP analysis (Fig. 5E). Peak calling and
300
subtraction of any site that appeared in the 35S:GFP data was performed over biological
301
replicates (Supplementary Fig. 6A, Supplementary Table 1), resulting in the identification of
302
2,295,498 RZ-1C cross-linked sites (XLS) in the Arabidopsis genome. 1,980,453 (86.3%),
303
90,257 (3.9 %), 130,762 (5.7 %) and 94,026 (4.1 %) of these sites were located in exons,
304
introns, 5ʹUTRs and 3ʹUTRs, respectively (Fig. 5F). These XLS were distributed over 10,937
305
genes (Supplementary Table 3), suggesting that RZ-1C binds to a broad range of RNAs in
30:
vivo. Furthermore, among the above identified 3,046 genes that differentially spliced on
307
chromatin (Fig. 5B) in the rz-1b rz-1c mutant, 84 % had RZ-1C eCLIP XLS (Fig. 5G). This
308
is a significantly higher number than the percentage of total genes bound by RZ-1C (38 %,
309
hypergeometric test, p < 2.2e-16) or the percentage of expressed genes (fpkm>1) at the same 13
310
stage (55%, hypergeometric test, p < 2.2e-16). This suggests a direct link between RZ-1C and
311
the promotion of CTS at those genes. The binding of the RZ-1C protein with RNA as well as
312
its
313
immunoprecipitation-qPCR (RIP-qPCR) and CB-RNA or mRNA extraction followed by
314
qPCR analysis of splicing efficiency (Fig. 5H and I; Supplementary Fig. 6B and C). Notably,
315
in consistent with the sequencing results, the difference of splicing efficiency between WT
31:
and rz-1b rz-1c was not observed in mRNA samples in qPCR analysis (Supplementary Fig.
317
7). Overall, our eCLIP data indicated that the RZ-1C protein promote efficient CTS of several
318
thousand genes involving its direct binding with their RNAs.
effect
on
CTS
were
further
validated
on
selected
genes
through
RNA
319 320
RZ-1C binds to both introns and exons with different strength.
321
We then monitored the distribution of RZ-1C XLS across loci to further dissect the
322
mechanism of RZ-1C-regulated CTS. RZ-1C XLS was located well within the predicted
323
transcription start sites (TSS) and polyadenylation sites (PAS) (Fig. 6A). As shown in Fig. 5F,
324
the apparent enrichment of RZ-1C XLS at exons over introns lead us to ask if this could be
325
due to the more presence of exons over introns at steady state within cell. Since RZ-1C
32:
locates exclusively within nucleus and associates with chromatin (Wu et al., 2016b), the
327
CB-RNA-seq data could serve as a good input control for our eCLIP analysis. In order to
328
judge the binding strength of RZ-1C with different RNAs, we normalized the eCLIP data to
329
the sequencing reads density calculated from CB-RNA-seq data. In general, RZ-1C still
330
favors exons over introns after this normalization (Fig. 6B, Green line). Among 10937 genes
331
bound by RZ-1C, RZ-1C XLS was detected at solely exons, at both exons and introns, and at
332
solely introns at 7496, 3370 and 71 genes respectively (Fig. 6C). Importantly, RZ-1C bound
333
and unbound introns are at similar abundance in CB-RNA-seq data, indicating the absence of
334
RZ-1C XLS from 7496 genes is not due to different abundance of their introns (Fig. 6D). 14
335
Interestingly, in cases where RZ-1C binds to both intron and adjacent exon, the intronic
33:
binding was substantially stronger than the exonic binding when taking into account their
337
steady state levels at chromatin (Fig. 6B, Black line).
338
at the majority of its targets, while in cases where it does binds to intron, the binding strength
339
is often stronger, compared with the adjacent exons.
Therefore, RZ-1C bind only to exons
340 341
RZ-1C binds to unspliced exons stronger than spliced exons.
342
Given RZ-1C binds only to exons at majority of its targets, we further wondered if its binding
343
had any preference towards spliced or unspliced exons. We focused on exon located XLSs
344
from which their corresponding sequencing reads are either spanning an exon-intron junction
345
or an exon-exon junction. In these two cases, the splicing state of the corresponding RNA
34:
that bound by RZ-1C can be determined faithfully. After normalize to the reads density
347
calculated from CB-RNA-seq data at each corresponding locations, exon-intron spanning
348
RNAs bound by RZ-1C are at higher levels than exon-exon spanning RNAs (Fig. 6E). This
349
result was further supported by RIP-qPCR at individual loci. We designed primers that are
350
specific for different forms of RNA and quantified the strength of binding as a function of
351
input (chromatin RNA before the IP) level, and found that the RZ-1C binding (as a
352
percentage of input value) was significantly higher at the exon-intron junction than at either
353
the exon only or the exon–exon junction (Supplementary Fig. 8), suggesting stronger binding
354
with unspliced exons.
355 35:
U-rich and GA-rich motifs were enriched in sequences surrounding RZ-1C XLS.
357
Next, we searched for specific motifs to which RZ-1C could preferentially bind.
358
EM for Motif Elicitation (MEME, see method) analysis revealed several degenerative motifs
359
(Fig. 6F). Enrichment of U-rich motifs was most likely due to the preference of UV 15
Multiple
3:0
crosslinking at such sequences. Importantly, GA-rich motifs were also enriched, similar to
3:1
those previously identified in RZ-1C systematic evolution of ligands by exponential
3:2
enrichment (SELEX) experiments (Wu et al., 2016b), indicating that GA-rich sequences are
3:3
likely to be bound by RZ-1C with high affinity. Notably, GA-rich motifs are also bound by
3:4
other splicing regulators such as SR proteins (Clery et al., 2011; Xing et al., 2015) and are
3:5
important for splicing regulation across different species. Therefore, in case of RZ-1C, plants
3::
take advantage of similar cis-elements as in mammals to regulate co-transcriptional splicing.
3:7 3:8
RZ-1C promotes CTS through both local and global mode.
3:9
The above data suggest that RZ-1C plays a critical role in enhancing the efficiency of CTS,
370
often through binding with exonic sequences, but does not fully elucidate its mode of action.
371
We propose a model that summarizes two alternative, but not mutually exclusive, modes of
372
action for RZ-1C in regulating CTS. In the first “local” mode, RZ-1C would act as a
373
canonical splicing factor, binding cis-elements at the exon or intron to define and facilitate
374
the removal of the adjacent intron (Fig. 6J, blue arrow). In the second “global” mode, RZ-1C
375
would facilitate removal of an intron through binding to exons or introns distal to it (Fig.6J,
37:
black arrow). We have evidence for both modes based on our CB-RNA-seq and eCLIP-seq
377
data. There were a total of 2572 genes that showed both differential CTS efficiency and
378
RZ-1C binding (Fig. 5G). Within the 2572 genes, 5408 splicing-affected exon-intron-exon
379
units were identified. Compared with the RZ-1C binding in splicing-affected units to the
380
random units taken from the same gene, we found a higher percentage of RZ-1C binding to
381
splicing-affected units, supporting the “local” mode of action (Fig. 6G). Consistent with our
382
previous results, RZ-1C bound higher percentage of exons than introns within the splicing
383
affected units (Fig. 6G). In the meantime, for intronic binding, significant difference between
384
splicing affected units and the random units was also observed (Fig. 6G). Suggest the intronic 1:
385
binding is also important for the CTS promotion of RZ-1C. Further, 27 % of the 5408 units
38:
did not have any RZ-1C binding within the unit itself but still showed defective splicing (Fig.
387
6H, “Total”). In these cases, RZ-1C binding distal to the affected intron would be important
388
for its efficient removal at co-transcriptional level, supporting the “global” mode. Thus, we
389
propose a working model of RZ-1B and RZ-1C in the regulation of CTS, in which RZ-1C
390
binds to exons, and in less cases, introns, and can promote the co-transcriptional removal of
391
introns that both proximal and distal to it (Fig. 6J).
392 393
Discussion
394
Numerous work including structural studies showed that splicing is a generally conserved
395
process from yeast to human, which can happen both during or post-transcription. With our
39:
data added, current evidence indicate that co-transcriptional splicing is a predominant mode
397
in all eukaryotes tested so far. In addition, several lines of our data is in consistent with the
398
kinetic coupling model between transcription and splicing as proposed previously (see
399
introduction). The CTS efficiency correlated with the length between intron 3’end and gene
400
3’end, while it negatively correlated with active histone marks (H3K4me3 and acetylation)
401
related to rapid transcriptional elongation. Notably, recent evidence showed the Pol II and
402
splicing machinery could be also coupled through Pol II CTD phosphorylated on the serine 5
403
(S5P). Nascent elongating transcripts sequencing (NET-seq) analysis with antibody against
404
Pol II CTD S5P in mammalian cells showed enrichment of sequencing reads with their 3’ end
405
mapped precisely to the exon 3’end, indicating such reads could be derived from splicing
40:
intermediates which were not from the Pol II active center but from spliceosome that
407
associate with Pol II CTD S5P (Nojima et al., 2015; Nojima et al., 2018).
408
17
409
How could kinetic coupling comply with the physical coupling regulated by dynamic
410
phosphorylations of Pol II CTD is an interesting question to be addressed in the future.
411
Related to this, we looked at how CTS efficiency was related to the Pol II levels as judged by
412
the high resolution pNET-seq data that published previously (Zhu et al., 2018). Interestingly,
413
a positively correlated trend was observed between 5’SS or 3’ SS ratio and the steady state
414
levels of Pol II that carries CTD S5P and or CTD S2P modifications (Supplementary Fig. 9).
415
This data is consistent with the observed correlation between SS ratio and active histone
41:
marks, given H3K4me3 and acetylation also indicate frequent Pol II initiation in addition to
417
fast elongation (Wozniak and Strahl, 2014). Therefore, fast Pol II initiation and elongation
418
could result in both higher Pol II level and relatively low CTS efficiency (due to fast
419
elongation), a similar scenario that has been described for FLC locus (Wu et al., 2016a). In
420
addition, we also noted a significant peak that associated with Pol II CTD S5P but not Pol II
421
CTD S2P at the junction of exon 3’ end and intron 5’ end ( Supplementary Fig. 9), most
422
likely representing splicing intermediates (Nojima et al., 2015; Zhu et al., 2018). Consistent
423
with this explanation, such a peak is more pronounce in exon-intron-exon groups with low SS
424
ratios, a scenario that would produce such splicing intermediates more frequently during
425
transcription.
42: 427
We found in this study that RZ-1C is a relatively specific regulator of CTS instead of an
428
essential factor for splicing in general. Indeed, most defects of CTS that observed in the rz-1b
429
rz-1c were not associated with intron retention in the steady state mRNA level (Fig 5B, C;
430
Supplementary Fig. 5B to D; Supplementary Fig. 7). In these cases, despite slower rate of
431
splicing at chromatin in rz-1b rz-1c, effective splicing seems occurs at post-transcriptional
432
level, ensuring successful intron removal before mRNA maturation. In addition, frequent
433
intron retentions may activate surveillance pathways that selectively degrade such transcripts 18
434
(e.g. non-sense mediated decay). Future work will be needed to distinguish between these
435
possibilities. Nevertheless, RZ-1B and RZ-1C likely work at the interface between
43:
transcription and mRNA maturation, while the efficient CTS governed by the RZ-1B/1C may
437
be part of the pipeline that ensures successful mRNA maturation or fate determination.
438 439
An intriguing aspect to explore in the future is whether RZ-1B/1C also affect transcriptional
440
elongation rate, like in case of some mammalian SR proteins (Ji et al., 2013; Lin et al., 2008;
441
Xiao et al., 2019). Our preliminary analysis support this hypothesis. We observed a steeper 5’
442
to 3’ slope of CB-RNA in rz-1b rz-1c double mutant compared to Col-0 (Supplementary Fig.
443
10). As we explained in Fig. 1B, the shape of 5’ to 3’ slope is tuned by Pol II firing rate,
444
elongation rate and also termination rate. Notably, a change of initiation rate along won’t
445
affect the shape of 5’ to 3’ slope. Therefore, our data is consistent with a hypothesis that
44:
RZ-1B/1C affect Pol II dynamics globally. Notably, Pol II travels at uneven speed across a
447
gene (e.g. Pausing at the exon-intron boundary), therefore techniques like pNET-seq would
448
be required in the future to reveal the role of RZ-1B/1C in regulating Pol II activity at high
449
spatial resolution and may further links to its RNA-binding and the roles in CTS regulation.
450 451
In mammals, intron are often one magnitude longer than exons, therefore, it was considered
452
that splicing happens through initially recognizing an exon and its adjacent splice site, a
453
mechanism known as “exon definition” (De Conti et al., 2013). In contrast, plant gene
454
splicing is generally considered to occur by intron definition, given the abundant retention of
455
introns, and their shorter length, compared with mammals (Reddy, 2007). According to intron
45:
definition model, introns and their 5’splice sites and 3’ spliced sites can be determined by
457
spliceosome. Apart from the apparent difference of intron length, what contribute and
458
determine the intron or exon definition is a yet unsolved question. In mammals, splicing 19
459
factors such as SR proteins often bind with exon splicing enhancers (ESE) that located at
4:0
exons and therefore contribute to the exon-definition. Interestingly, as we reported in this
4:1
study, RZ-1C binds to mainly exons at the majority of its targets, although both its exonic and
4:2
intronic bindings are important for the regulation of CTS. Currently, it is unclear if intron
4:3
definition in plant would accompany with more frequent presence of regulatory motifs in
4:4
introns. Systematic analysis of RNA-binding features of splicing factors through eCLIP-seq,
4:5
a reliable and powerful method as we adapted in this study, would likely provide an unbiased
4::
answer to this question. Notably, a recent study in yeast showed E complex of spliceosome
4:7
can adapt both intron and exon definition mode without significant conformational change
4:8
and showed evidence that exon definition can also happen in yeast (Li et al., 2019), a species
4:9
confers intron-definition as previous considered. Therefore, it would be interesting to test if
470
exon-definition and intron definition co-exist in plant as well, in connection with the
471
RNA-binding properties of splicing factors.
472 473
An unexpected discovery from our study is that exon and or intron number correlated with
474
efficient CTS. Consistent with our finding, a recent study based on long reads sequencing of
475
chromatin-bound RNA reveals a significant proportion of yeast nascent RNA are either fully
47:
spliced or fully unspliced (Herzel et al., 2018). The removal of individual introns are likely to
477
be coordinated instead of following a simple ‘first come, first served’ rule (Fededa et al.,
478
2005; Tilgner et al., 2018). Although contribution of introns does exist, our work highlights
479
the role of exons in this mechanism. Exactly how multiple exons and introns promote CTS is
480
an exciting next question to address. RZ-1B/1C seems be part of this mechanism, as
481
RZ-1B/1C-regulated exon-introns-exon units were more enriched in genes with a higher
482
number of introns or exons (Fig.6I). Besides RZ-1B/1C, other factors likely are also involved
483
in this mechanism as the trend between CTS efficiency and intron numbers remains when 20
484
comparing RZ-1C bound and unbound genes (Supplementary Fig. 11A), despite the CTS
485
efficiency was generally higher for RZ-1C bound genes (Supplementary Fig. 11B). Previous
48:
research showed higher order interactions exist among splicing factors and RNAs (Singh et
487
al., 2012). This is consistent with the role of the RZ-1 proteins, which could promote CTS by
488
being situated at exons and introns, even when distal to the affected intron. RZ-1C could
489
form nuclear speckles and interact with multiple SR proteins (Wu et al., 2016b). Such
490
interactions were confirmed in vivo through GFP-RZ-1C-IP-MS, in addition, multiple other
491
splicing factors were also detected (Supplementary Table 4). An interesting hypothesis to test
492
is whether there are multivalent interactions among exons, introns and splicing regulators
493
such as RZ-1C and SRs, and if such interactions could contribute to the more efficient CTS
494
for gene with higher numbers of introns (Fig. 6I). What’s the role of Pol II during this process
495
is another yet unanswered question waiting for future study. Overall, our work has revealed
49:
the co-transcriptional nature of plant gene splicing and highlighted the role of exon/intron
497
numbers, as well as the role of RZ-1 proteins, in this process.
498 499
Methods
500
Plant materials and growth conditions
501
The Col-0 ecotype of Arabidopsis thaliana was used as the wild-type control in this study.
502
The rz-1b rz-1c double mutant and RZ-1C:GFP-RZ-1C in rz-1b rz-1c have been described
503
previously (Wu et al., 2016b). All experiments used seedlings grown on 1/2 Murashige and
504
Skoog (MS) medium for 10 d at 22 °C under 16h light/8h dark cycles.
505 50:
CB-RNA extraction and sequencing library construction
507
CB-RNA extraction in Arabidopsis seedlings was performed as described previously (Bhatt
508
et al., 2012; Wu et al., 2016a; Wuarin and Schibler, 1994) (Supplementary Fig. 1A) and 21
509
followed the guidelines previously proposed (Herzel and Neugebauer, 2015). In brief, nuclei
510
from 2 g of ground seedlings were prepared using a Honda buffer supplement with the
511
addition of RNase inhibitor and yeast tRNA, to protect the integrity of RNA. The nuclei
512
pellet was first resuspended in an equal volume of resuspension buffer (50 % glycerol, 0.5
513
mM EDTA, 1 mM DTT, 25 mM Tris-HCL pH 7.5, 100 mM NaCl, 100ng/µl tRNA, 0.01U/µl
514
RNasin®), followed by brief washing with two volumes of UREA wash buffer (25 mM
515
Tris-HCL pH 7.5, 300 mM NaCl, 1 M urea, 0.5 mM EDTA, 1 mM DTT, 1% Tween-20). The
51:
chromatin was pelleted by centrifugation at 8000 g for 1 min, resuspended with an equal
517
volume of resuspension buffer, followed by washing by using one volume of UREA wash
518
buffer. The resulting chromatin pellet after centrifugation was resuspended in 1 mL Trizol
519
and followed by RNA extraction in combination with the RNeasy Mini Kit (Qiagen). The
520
integrity of the resulting CB-RNA was checked by gel electrophoresis by comparing to the
521
pattern of total RNA. DNA copurified with CB-RNA was removed by two sequential
522
digestions with TURBO DNase (Thermo). The resulting RNA was then used for further
523
analysis.
524 525
For constructing the CB-RNA-seq library, the above CB-RNA was treated with Ribominus
52:
(Thermo) to remove the rRNA following the kit instructions. Any contaminated
527
polyadenylated RNAs were removed using Oligo d(T) cellulose beads (NEB). A
528
strand-specific RNA-seq library was then constructed using the dUTP method. The resulting
529
library was pair-end sequenced on the Illumina platform. A summary of all sequencing
530
results is included in Supplementary Table 1 .
531 532
CB-RNA and mRNA data analysis
22
533
The raw data of CB-RNA were mapped to the Arabidopsis genome (TAIR10) with HISAT2
534
v2.0.5 (Pertea et al., 2016), and the Fragments Per Kilobase of transcript per Million mapped
535
reads (FPKM) of the genes were calculated by StringTie v1.3.4d (Pertea et al., 2016) with
53:
default parameters. Genes with a FPKM < 1 were filtered following the 5ʹSS or 3ʹSS ratio
537
calculation for exon-intron-exon features to determine the co-transcriptional splicing
538
frequency (Fig. 1G). Reads were counted from CB-RNA-seq mapped to 25 bp upstream (5ʹ
539
exonic reads) or downstream (5ʹ intronic reads) of the exon-intron junction as well as 25 bp
540
downstream (3ʹ exonic reads) or upstream (3ʹ intronic reads) of the intron-exon junction. The
541
ratio of 5ʹ intronic reads number to 5ʹ exonic reads number was calculated as the 5ʹSS ratio
542
(Fig. 1G). The 3ʹSS ratio was calculated in a similar way. The differentially splicing introns
543
between the wild type (WT) and the double mutant were identified as previously reported by
544
using Fisher’s test (Deng et al., 2010). Briefly, Fisher’s test was performed to identify
545
differential representation of intron over two flanking exons. The test was performed by using
54:
normalized CB-RNA-seq or mRNA-seq read counts of each intron and its two flanking exons
547
of the WT and rz-1b rz-1c, followed by Benjamini-Hochberg multiple comparison correction.
548
The intron with an FDR less than 0.001 was considered as the differentially splicing intron.
549
The ChIP-seq data of histone modifications from GSE28398 (Luo et al., 2013) were mapped
550
by Bowtie v1.1.2 (Langmead et al., 2009) and only reads mapped to unique position of
551
genome were kept in following analysis. All histone modifications data were normalized by
552
total number of reads. The raw data from mRNA-seq were mapped with TopHat v2.1.1
553
(Trapnell et al., 2012). Cufflinks v2.2.1 (Trapnell et al., 2012) was used to calculate the
554
FPKM of genes. Cuffdiff v2.2.1 was used to identify significant differentially expressed
555
genes with the following criteria: FPKM > 1, fold change > 2 and FDR < 0.05.
55:
The alternative splicing isoforms A3SS, A5SS, IR and ES were identified based on TAIR10
557
gene annotation (Fig. 4). Individual AS event was kept for further analysis in case it is 23
558
supported by sequencing reads from our mRNA-seq data. "Percentage spliced in" (PSI) was
559
used to denote the fraction of alternative splicing isoforms, a popularly used method (Katz et
5:0
al., 2010). Briefly, PSI indicates the ratio of two isoforms expressed simultaneously in a
5:1
tissue. A PSI towards 1 or 0 suggests a trend that one isoform of an alternatively spliced gene
5:2
was expressed. A PSI of 0.5 indicates equal expression of two isoforms. PSI values were
5:3
calculated using MISO v0.5.4 (Katz et al., 2010) in the CB-RNA and mRNA datasets. Reads
5:4
coverage in this study was calculated as reads number per million total sequencing reads per
5:5
base pair.
5::
To analyse the variance of CTS caused by intron number and gene length, a general linear
5:7
model (GLM) was fitted as follows: CTS efficiency (calculated as the mean of 5ʹ and 3ʹSS
5:8
values in a gene) ~intron number of the gene × gene length of the gene, followed by an
5:9
ANOVA test. Only genes (10,382 of 12,910 genes with available SS values) between 1–4kb
570
in length were used in the GLM analysis.
571
eCLIP and sequencing library construction
572
eCLIP was performed as previously described with modifications (Konig et al., 2010; Meyer
573
et al., 2017; Van Nostrand et al., 2016; Zhang et al., 2015). In brief, plants growing on petri
574
dishes for 10 d were gently harvested and immersed in an shallow layer (0.5 cm) of ice-cold
575
1xPBS and cross-linked with UV (245 nM) light at 600 mJ/cm2 on ice, a dose proven to be
57:
sufficient for Arabidopsis seedlings (Meyer et al., 2017; Zhang et al., 2015). 2 g of liquid
577
nitrogen-ground plant powder was lysed with lysis buffer, DNase I/RNase I treated and
578
sonicated to release the RNA/proteins. The resulting lysate were cleared by centrifugation,
579
then subjected to immunoprecipitation with an anti-GFP antibody (Ab290, Abcam) and
580
protein A beads (Dynabeads, Thermo). The RNA/protein complex after IP was washed
581
extensively including several washes with high salt wash buffer (1M NaCl), treated with 24
582
polynucleotide kinase and alkaline phosphatase to remove the terminal phosphate, ligated to
583
an RNA adaptor at the 3ʹ end and end-labeled with
584
complex was eluted using LDS sample buffer (Thermo) and resolved on a 4–12 % NuPAGE
585
(Thermo) gel and then transferred to nitrocellulose membrane for autoradiography (~3hrs).
58:
The remaining 10 % of the complex was run in parallel to check the enrichment of the target
587
protein. The region above the size of proteins (from 70kD to ~140kD) was cut out based on
588
the autoradiograph and RNA was released by proteinase K digestion. The resulting RNA was
589
further purified and reverse transcribed using SuperScript III (Thermo) with a primer
590
complementary to the RNA adaptor. The resulting cDNA was purified and subjected to DNA
591
adaptor ligation to its 3ʹ end. The final library was amplified from the cDNA by using primers
592
specific to both end adaptors (FP-F and FP-R) using Q5 Hot-Start DNA polymerase.
593
GFP-RZ-1C-eCLIP sample, typically 12 cycles were sufficient for final library amplification
594
from cDNA. The resulting PCR products were purified and resolved on a 3 % low melting
595
point agarose gel and fragments from 175–400 bp were recovered from the gel and used for
59:
pair-end Illumina (PE150) sequencing. A summary of all sequencing results is shown in
597
Supplementary Table 1. All oligos and adaptor sequences are listed in Supplementary Table
598
5.
32
P. Ninety percent of the resulting
For
599 :00
eCLIP data analysis
:01
eCLIP data were processed using iCount v2.0.1.dev (T. Curk, 2016). Given the principle of
:02
eCLIP experiment, only R2 reads were used in binding site identification although pair-end
:03
sequencing results were generated. In brief, we used the “demultiplex” function, removed the
:04
adapter sequence from reads and extracted 10 bp random sequences, which were used to
:05
mark the PCR duplicates. The clean reads were then aligned with the “mapstar” function. All
:0:
binding sites were identified and quantified using "xlsites" function (parameters: --group_by 25
:07
start--quant cDNA). Finally, the "peaks" function was used to identify significant binding
:08
sites (FDR < 0.05), and binding clusters were generated with the "clusters" function. For the
:09
five replicates generated by eCLIP, the clusters were identified in each replicate separately,
:10
and in a pool of the five replicates. Only clusters from the pooled replicates that were also
:11
identified in all five replicates were retained as binding clusters. Only binding sites found in
:12
the binding clusters were kept for the subsequent analyses. MEME-ChIP (Machanick and
:13
Bailey, 2011) was used to find motifs enriched within the binding sites.
:14 :15
qPCR validation of CB-RNA-seq data and mRNA-seq data
:1:
CB-RNA was extracted and purified as described above. mRNA was purified from total RNA
:17
by using Dynabeads mRNA Purification Kit (Thermo). In brief, 500 ng of chromatin-bound
:18
RNA was treated twice with TURBO DNase (Thermo) and used for cDNA synthesis with
:19
gene-specific primers and SuperScript III (Thermo) at 52 ºC for 1 h. In case of mRNA, 50ng
:20
of mRNA was used for cDNA synthesis. Two primer pairs were designed for measuring
:21
splicing efficiency for each exon-intron-exon structure: one primer pair spanned the
:22
exon-exon junction for amplification of the spliced transcripts only, and another primer pair
:23
spanned the intron-exon junction for amplification of unspliced transcripts only. All the
:24
reverse primers were included in the reverse transcription reaction to ensure gene and strand
:25
specific detection of RNA. The resulting cDNA was diluted 30 times with water and used for
:2:
quantitative PCR using a qTOWER3 84 (Jena) and SYBR Green Master Mix (Roche). For
:27
data normalization, the value obtained by the spliced primer was normalized to the value
:28
obtained by the unspliced primer. The values from three independent samples were
:29
normalized to the mean WT level. Final values are ± S.E.M. from three independent samples.
:30
A ‘no reverse transcriptase’ control (- RT) was used to ensure the values reflected the level of
2:
:31
RNA and not DNA contamination. Primers used for cDNA synthesis and quantitative PCR
:32
are described in Supplementary Table 5.
:33 :34
RNA immunoprecipitation
:35
Nuclei from 2 g of ground seedlings were prepared with Honda buffer as described in
:3:
CB-RNA extraction and sequencing library construction. Nuclei pellets were resuspended in
:37
2.5 volumes of Nuclei Lysis Buffer (50 mM Tris-HCl pH 8.0, 10 mM EDTA, 1 % SDS, 1x
:38
protease inhibitor cocktail, 50 ng/µl tRNA) and sonicated in a Bioruptor (Diagenode) 15
:39
times (30s on/30s off). Immunoprecipitation was performed by incubating 50 µL Dynabeads
:40
protein A (Thermo), 5 µg antibody (Anti-GFP, AB290, Abcam), and 1.2 mL diluted
:41
chromatin (containing 100 µL sonicated chromatin) at 4 ºC for 1.5 h. After the IP, the beads
:42
were washed four times in washing buffer (167 mM Nacl, 16.7 mM Tris pH 7.5, 1.2 mM
:43
EDTA, 0.8 % Triton X-100, 1x protease inhibitor, 20 U/mL RNase inhibitor). The
:44
RNA-protein complex was eluted and reverse cross-linked by adding 200 µL elution buffer
:45
(2 mM EDTA, 0.2 % SDS, Proteinase K)) to the washed beads and incubating at 55 ºC
:4:
overnight. The RNA was precipitated, dissolved, DNase treated, and then used as a template
:47
for reverse transcription with gene/strand-specific primers. The values presented in the figure
:48
are ± S.E.M. from three independent samples and shown as IP/1 % of input (RNA). All the
:49
primers used are listed in Supplementary Table 5.
:50 :51
Immunoprecipitation and Mass Spectrometry
:52
Immunoprecipitation and Mass Spectrometry analysis were done as previously described
:53
with modifications (Questa et al., 2016; Wang et al., 2014). For immunoprecipitation, 5 g of
:54
seedling of RZ-1C:GFP-RZ-1C and 35S:GFP were harvested, grinded to fine powder with
:55
liquid nitrogen. Two volumes of extraction buffer (20 mM Tris–HCl pH 8, 150 mM NaCl, 27
:5:
2.5 mM EDTA, 1% Triton X-100 and 2x Proteinase inhibitor (Roche)) were added into the
:57
grinded powder, followed by gentle rotation at 4°C for 10mins. The supernatant was
:58
recovered after three sequential centrifugation at 14000rpm for 5mins. Pre-washed GFP-Trap
:59
agarose beads (Chromotek) were incubated with supernatant for 2hrs at 4°C with rotation,
::0
followed by washing with 1ml extraction buffer for 5 times. The resulting beads were boiled
::1
in 1x LDS sample buffer(Thermo). 5% of the resulting samples were loaded on 10%
::2
SDS-PAGE gel and transferred to PVDF membrane, followed by western blot by using
::3
anti-GFP antibody (AB290, Abcam).
::4
For Mass Spectrometry analysis, protein samples from above steps were loaded into
::5
SDS-PAGE gel (10% stacking gels) and run until all the loading dye migrate into the gel. The
:::
band was cut out as a whole, washed, reduced and alkylated, followed by trypsin digestion as
::7
previously reported (Shevchenko et al., 2006). Peptides extraction was done by using 5%
::8
formic acid/50% acetonitrile, dried down and re-dissolved in 0.1% TFA. LC-MS/MS analysis
::9
was done by using Synapt G2Si mass spectrometer (Waters) combined with a
:70
nanoAcquityTM UPLCTM-system (Waters) running a BEH C18 column (1.7 µm, 75 µm x
:71
250mm, Waters). Raw files from the Synapt G2Si were processed with the Protein Lynx
:72
Global Server (PLGS 3.0.2) software (Waters), and search against TAIR10_pep_20101214
:73
database (www.arabidopsis.org). Results were exported as excel files using the Ion
:74
Accounting Output (IdentityE).
:75 :7:
Data Availability
:77
All data generated in this study were deposited in the public domain as follows: Gene
:78
Expression Omnibus data accession: GSE128619
:79
(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128619).
:80 28
:81
Acknowledgements
:82
We thank Dr. Caroline Dean for her generous support on this work. We thank Dr. Hongwei
:83
Guo, Dr. Yu Zhou, Dr. Suomeng Dong, Dr Robert Ietswaart and Dr. Zhicheng Dong for
:84
discussions and suggestions; Dr. Shaofang Li and Dr. Xuemei Chen for communication and
:85
coordination; Dr. Xiaofeng Cao and Dr. Jernej Ule for suggestions on CLIP experiment. This
:8:
work was supported by Guangdong Innovation Research Team Fund (2016ZT06S172), the
:87
Shenzhen Sci-Tech Fund (KYTDPT20181011104005) and the National Natural Science
:88
Foundation of China (31771365 to ZW and 31800268 to DLZ).
:89 :90
Author Contributions
:91
ZW, YFW, HYG and LJQ designed the research; DLZ, FM, XYL and ZW performed the
:92
research; FM, DLZ, LFG, YFW, ZW analysed the data; ZW and YFW wrote the paper.
:93 :94
References
:95 :9: :97 :98 :99 700 701 702 703 704 705 70: 707 708 709 710 711 712 713 714 715 71: 717 718
Alexander, R.D., Barrass, J.D., Dichtl, B., Kos, M., Obtulowicz, T., Robert, M.C., Koper, M., Karkusiewicz, I., Mariconti, L., Tollervey, D., et al. (2010). RiboSys, a high-resolution, quantitative approach to measure the in vivo kinetics of pre-mRNA splicing and 3'-end processing in Saccharomyces cerevisiae. RNA 16:2570-2580. Alexandrov, A., Colognori, D., Shu, M.D., and Steitz, J.A. (2012). Human spliceosomal protein CWC22 plays a role in coupling splicing to exon junction complex deposition and nonsense-mediated decay. Proc Natl Acad Sci U S A 109:21313-21318. Ameur, A., Zaghlool, A., Halvardson, J., Wetterbom, A., Gyllensten, U., Cavelier, L., and Feuk, L. (2011). Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nat Struct Mol Biol 18:1435-1440. Aslanzadeh, V., Huang, Y., Sanguinetti, G., and Beggs, J.D. (2018). Transcription rate strongly affects splicing fidelity and cotranscriptionality in budding yeast. Genome Res 28:203-213. Barbosa, I., Haque, N., Fiorini, F., Barrandon, C., Tomasetto, C., Blanchette, M., and Le Hir, H. (2012). Human CWC22 escorts the helicase eIF4AIII to spliceosomes and promotes exon junction complex assembly. Nat Struct Mol Biol 19:983-990. Bauren, G., and Wieslander, L. (1994). Splicing of Balbiani ring 1 gene pre-mRNA occurs simultaneously with transcription. Cell 76:183-192. Bentley, D.L. (2014). Coupling mRNA processing with transcription in time and space. Nat Rev Genet 15:163-175. Beyer, A.L., and Osheim, Y.N. (1988). Splice site selection, rate of splicing, and alternative splicing on nascent transcripts. Genes Dev 2:754-765. Bhatt, D.M., Pandya-Jones, A., Tong, A.J., Barozzi, I., Lissner, M.M., Natoli, G., Black, D.L., and Smale, S.T. (2012). Transcript dynamics of proinflammatory genes revealed by sequence analysis of subcellular RNA fractions. Cell 150:279-290. 29
719 720 721 722 723 724 725 72: 727 728 729 730 731 732 733 734 735 73: 737 738 739 740 741 742 743 744 745 74: 747 748 749 750 751 752 753 754 755 75: 757 758 759 7:0 7:1 7:2 7:3 7:4 7:5 7:: 7:7 7:8 7:9 770 771 772 773
Brugiolo, M., Herzel, L., and Neugebauer, K.M. (2013). Counting on co-transcriptional splicing. F1000Prime Rep 5:9. Clery, A., Jayne, S., Benderska, N., Dominguez, C., Stamm, S., and Allain, F.H. (2011). Molecular basis of purine-rich RNA recognition by the human SR-like protein Tra2-beta1. Nat Struct Mol Biol 18:443-450. Church, M.C., and Fleming, A.B. (2018). A role for histone acetylation in regulating transcription elongation. Transcription 9:225-232. Das, R., Yu, J., Zhang, Z., Gygi, M.P., Krainer, A.R., Gygi, S.P., and Reed, R. (2007). SR proteins function in coupling RNAP II transcription to pre-mRNA splicing. Mol Cell 26:867-881. de Almeida, S.F., and Carmo-Fonseca, M. (2014). Reciprocal regulatory links between cotranscriptional splicing and chromatin. Semin Cell Dev Biol 32:2-10. De Conti, L., Baralle, M., and Buratti, E. (2013). Exon and intron definition in pre-mRNA splicing. Wiley Interdiscip Rev RNA 4:49-60. de la Mata, M., Alonso, C.R., Kadener, S., Fededa, J.P., Blaustein, M., Pelisch, F., Cramer, P., Bentley, D., and Kornblihtt, A.R. (2003). A slow RNA polymerase II affects alternative splicing in vivo. Mol Cell 12:525-532. de la Mata, M., and Kornblihtt, A.R. (2006). RNA polymerase II C-terminal domain mediates regulation of alternative splicing by SRp20. Nat Struct Mol Biol 13:973-980. de la Mata, M., Lafaille, C., and Kornblihtt, A.R. (2010). First come, first served revisited: factors affecting the same alternative splicing event have different effects on the relative rates of intron removal. RNA 16:904-912. Deng, X., Gu, L., Liu, C., Lu, T., Lu, F., Lu, Z., Cui, P., Pei, Y., Wang, B., Hu, S., et al. (2010). Arginine methylation mediated by the Arabidopsis homolog of PRMT5 is essential for proper pre-mRNA splicing. Proc Natl Acad Sci U S A 107:19114-19119. Dolata, J., Guo, Y., Kolowerzo, A., Smolinski, D., Brzyzek, G., Jarmolowski, A., and Swiezewski, S. (2015). NTR1 is required for transcription elongation checkpoints at alternative exons in Arabidopsis. EMBO J 34:544-558. Drexler HL, Choquet K, Churchman LS (2019) Human co-transcriptional splicing kinetics and coordination revealed by direct nascent RNA sequencing. bioRxiv, :611020. https://doi.org/10.1101/611020. Fededa, J.P., Petrillo, E., Gelfand, M.S., Neverov, A.D., Kadener, S., Nogues, G., Pelisch, F., Baralle, F.E., Muro, A.F., and Kornblihtt, A.R. (2005). A polar mechanism coordinates different regions of alternative splicing within a single gene. Mol Cell 19:393-404. Godoy Herz, M.A., Kubaczka, M.G., Brzyzek, G., Servi, L., Krzyszton, M., Simpson, C., Brown, J., Swiezewski, S., Petrillo, E., and Kornblihtt, A.R. (2019). Light Regulates Plant Alternative Splicing through the Control of Transcriptional Elongation. Mol Cell. Hanano, S., Sugita, M., and Sugiura, M. (1996). Isolation of a novel RNA-binding protein and its association with a large ribonucleoprotein particle present in the nucleoplasm of tobacco cells. Plant Mol Biol 31:57-68. Herzel, L., and Neugebauer, K.M. (2015). Quantification of co-transcriptional splicing from RNA-Seq data. Methods 85:36-43. Herzel, L., Straube, K., and Neugebauer, K.M. (2018). Long-read sequencing of nascent RNA reveals coupling among RNA processing events. Genome Res 28:1008-1019. Huranova, M., Ivani, I., Benda, A., Poser, I., Brody, Y., Hof, M., Shav-Tal, Y., Neugebauer, K.M., and Stanek, D. (2010). The differential interaction of snRNPs with pre-mRNA reveals splicing kinetics in living cells. J Cell Biol 191:75-86. Ietswaart, R., Rosa, S., Wu, Z., Dean, C., and Horward, M. (2017). Cell-Size-Dependent transcription of FLC and its antisense long non-coding RNA COOLAIR explain cell-to-cell expression variation. Cell Syst. 4:622-635. Jangi, M., and Sharp, P.A. (2014). Building robust transcriptomes with master splicing factors. Cell 159:487-498. Jeong, S. (2017). SR Proteins: Binders, Regulators, and Connectors of RNA. Mol Cells 40:1-9. Ji, X., Zhou, Y., Pandit, S., Huang, J., Li, H., Lin, C.Y., Xiao, R., Burge, C.B., and Fu, X.D. (2013). SR proteins collaborate with 7SK and promoter-associated nascent RNA to release paused polymerase. Cell 153:855-868. 30
774 775 77: 777 778 779 780 781 782 783 784 785 78: 787 788 789 790 791 792 793 794 795 79: 797 798 799 800 801 802 803 804 805 80: 807 808 809 810 811 812 813 814 815 81: 817 818 819 820 821 822 823 824 825 82: 827
Katz, Y., Wang, E.T., Airoldi, E.M., and Burge, C.B. (2010). Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7:1009-1015. Khodor, Y.L., Rodriguez, J., Abruzzi, K.C., Tang, C.H., Marr, M.T., 2nd, and Rosbash, M. (2011). Nascent-seq indicates widespread cotranscriptional pre-mRNA splicing in Drosophila. Genes Dev 25:2502-2512. Khodor, Y.L., Menet, J.S., Tolan, M., and Rosbash, M. (2012). Cotranscriptional splicing efficiency differs dramatically between Drosophila and mouse. RNA 18:2174-2186. Konig, J., Zarnack, K., Rot, G., Curk, T., Kayikci, M., Zupan, B., Turner, D.J., Luscombe, N.M., and Ule, J. (2010). iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol 17:909-915. Kornblihtt, A.R. (2007). Coupling transcription and alternative splicing. Adv Exp Med Biol 623:175-189. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. Li, X., Liu, S., Zhang, L., Issaian, A., Hill, R.C., Espinosa, S., Shi, S., Cui, Y., Kappel, K., Das, R., et al. (2019). A unified mechanism for intron and exon definition and back-splicing. Nature 573:375-380. Lin, S., Coutinho-Mansfield, G., Wang, D., Pandit, S., and Fu, X.D. (2008). The splicing factor SC35 has an active role in transcriptional elongation. Nat Struct Mol Biol 15:819-826. Lorkovic, Z.J., and Barta, A. (2002). Genome analysis: RNA recognition motif (RRM) and K homology (KH) domain RNA-binding proteins from the flowering plant Arabidopsis thaliana. Nucleic Acids Res 30:623-635. Luo, C., Sidote, D.J., Zhang, Y., Kerstetter, R.A., Michael, T.P., and Lam, E. (2013). Integrative analysis of chromatin states in Arabidopsis identified potential regulatory mechanisms for natural antisense transcript production. Plant J 73:77-90. Machanick, P., and Bailey, T.L. (2011). MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27:1696-1697. Marondedze, C., Thomas, L., Serrano, N.L., Lilley, K.S., and Gehring, C. (2016). The RNA-binding protein repertoire of Arabidopsis thaliana. Sci Rep 6:29766. Marquardt, S., Raitskin, O., Wu, Z., Liu, F., Sun, Q., and Dean, C. (2014). Functional consequences of splicing of the antisense transcript COOLAIR on FLC transcription. Mol Cell 54:156-165. Meyer, K., Koster, T., Nolte, C., Weinholdt, C., Lewinski, M., Grosse, I., and Staiger, D. (2017). Adaptation of iCLIP to plants determines the binding landscape of the clock-regulated RNA-binding protein AtGRP7. Genome Biol 18:204. Monsalve, M., Wu, Z., Adelmant, G., Puigserver, P., Fan, M., and Spiegelman, B.M. (2000). Direct coupling of transcription and mRNA processing through the thermogenic coactivator PGC-1. Mol Cell 6:307-316. Naftelberg, S., Schor, I.E., Ast, G., and Kornblihtt, A.R. (2015). Regulation of alternative splicing through coupling with transcription and chromatin structure. Annu Rev Biochem 84:165-198. Nagai, S., Davis, R.E., Mattei, P.J., Eagen, K.P., and Kornberg, R.D. (2017). Chromatin potentiates transcription. Proc Natl Acad Sci U S A 114:1536-1541. Nojima, T., Gomes, T., Grosso, A.R.F., Kimura, H., Dye, M.J., Dhir, S., Carmo-Fonseca, M., and Proudfoot, N.J. (2015). Mammalian NET-Seq Reveals Genome-wide Nascent Transcription Coupled to RNA Processing. Cell 161:526-540. Nojima, T., Rebelo, K., Gomes, T., Grosso, A.R., Proudfoot, N.J., and Carmo-Fonseca, M. (2018). RNA Polymerase II Phosphorylated on CTD Serine 5 Interacts with the Spliceosome during Co-transcriptional Splicing. Mol Cell 72:369-379 e364. Oesterreich, F.C., Herzel, L., Straube, K., Hujer, K., Howard, J., and Neugebauer, K.M. (2016). Splicing of nascent RNA coincides with intron exit from RNA polymerase II. Cell 165:372-381. Osheim, Y.N., Miller, O.L., Jr., and Beyer, A.L. (1985). RNP particles at splice junction sequences on Drosophila chorion transcripts. Cell 43:143-151. Pandya-Jones, A., and Black, D.L. (2009). Co-transcriptional splicing of constitutive and alternative exons. RNA 15:1896-1908. 31
828 829 830 831 832 833 834 835 83: 837 838 839 840 841 842 843 844 845 84: 847 848 849 850 851 852 853 854 855 85: 857 858 859 8:0 8:1 8:2 8:3 8:4 8:5 8:: 8:7 8:8 8:9 870 871 872 873 874 875 87: 877 878 879 880 881
Pertea, M., Kim, D., Pertea, G.M., Leek, J.T., and Salzberg, S.L. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11:1650-1667. Questa, J.I., Song, J., Geraldo, N., An, H., and Dean, C. (2016). Arabidopsis transcriptional repressor VAL1 triggers Polycomb silencing at FLC during vernalization. Science 353:485-488. Reddy, A.S. (2007). Alternative splicing of pre-messenger RNAs in plants in the genomic era. Annu Rev Plant Biol 58:267-294. Reddy, A.S., Marquez, Y., Kalyna, M., and Barta, A. (2013). Complexity of the alternative splicing landscape in plants. Plant Cell 25:3657-3683. Reddy, A.S., and Shad Ali, G. (2011). Plant serine/arginine-rich proteins: roles in precursor messenger RNA splicing, plant development, and stress responses. Wiley Interdiscip Rev RNA 2:875-889. Reichel, M., Liao, Y., Rettel, M., Ragan, C., Evers, M., Alleaume, A.M., Horos, R., Hentze, M.W., Preiss, T., and Millar, A.A. (2016). In Planta Determination of the mRNA-Binding Proteome of Arabidopsis Etiolated Seedlings. Plant Cell 28:2435-2452. Roberts, G.C., Gooding, C., Mak, H.Y., Proudfoot, N.J., and Smith, C.W. (1998). Co-transcriptional commitment to alternative splice site selection. Nucleic Acids Res 26:5568-5572. Rosa, S., Duncan, S., and Dean, C. (2016). Mutually exclusive sense-antisense transcription at FLC facilitates environmentally induced gene repression. Nat Commun 7:13031. Saldi, T., Cortazar, M.A., Sheridan, R.M., and Bentley, D.L. (2016). Coupling of RNA polymerase II transcription elongation with pre-mRNA splicing. J Mol Biol 428:2623-2635. Shevchenko, A., Tomas, H., Havlis, J., Olsen, J.V., and Mann, M. (2006). In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat Protoc 1:2856-2860. Singh, G., Kucukural, A., Cenik, C., Leszyk, J.D., Shaffer, S.A., Weng, Z., and Moore, M.J. (2012). The cellular EJC interactome reveals higher-order mRNP structure and an EJC-SR protein nexus. Cell 151:915-916. Singh, J., and Padgett, R.A. (2009). Rates of in situ transcription and splicing in large human genes. Nat Struct Mol Biol 16:1128-1133. Staiger, D., and Brown, J.W. (2013). Alternative splicing at the intersection of biological timing, development, and stress responses. Plant Cell 25:3640-3656. Steckelberg, A.L., Boehm, V., Gromadzka, A.M., and Gehring, N.H. (2012). CWC22 connects pre-mRNA splicing and exon junction complex assembly. Cell Rep 2:454-461. T. Curk, G.R., Č. Gorup, J. Zmrzlikar, J. Konig, Y. Sugimoto, N. Haberman, G. Bobojević, C. Hauer, M. Hentze, B. Zupan, J. Ule,. (2016). iCount: protein-RNA interaction iCLIP data analysis (in preparation). Tilgner, H., Knowles, D.G., Johnson, R., Davis, C.A., Chakrabortty, S., Djebali, S., Curado, J., Snyder, M., Gingeras, T.R., and Guigo, R. (2012). Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res 22:1616-1625. Tilgner, H., Jahanbani, F., Gupta, I., Collier, P., Wei, E., Rasmussen, M., and Snyder, M. (2018). Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome. Genome Res 28:231-242. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7:562-578. Ullah, F., Hamilton, M., Reddy, A.S.N., and Ben-Hur, A. (2018). Exploring the relationship between intron retention and chromatin accessibility in plants. BMC Genomics 19:21. Van Nostrand, E.L., Pratt, G.A., Shishkin, A.A., Gelboin-Burkhart, C., Fang, M.Y., Sundararaman, B., Blue, S.M., Nguyen, T.B., Surka, C., Elkins, K., et al. (2016). Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods 13:508-514. Vargas, D.Y., Shah, K., Batish, M., Levandoski, M., Sinha, S., Marras, S.A., Schedl, P., and Tyagi, S. (2011). Single-molecule imaging of transcriptionally coupled and uncoupled splicing. Cell 147:1054-1065. 32
882 883 884 885 88: 887 888 889 890 891 892 893 894 895 89: 897 898 899 900 901 902 903 904 905 90: 907 908
Wang, Z.W., Wu, Z., Raitskin, O., Sun, Q., and Dean, C. (2014). Antisense-mediated FLC transcriptional repression requires the P-TEFb transcription elongation factor. Proc Natl Acad Sci U S A 111:7468-7473. Wozniak, G.G., and Strahl, B.D. (2014). Hitting the 'mark': interpreting lysine methylation in the context of active transcription. Biochim Biophys Acta 1839:1353-1361. Wu, Z., Ietswaart, R., Liu, F., Yang, H., Howard, M., and Dean, C. (2016a). Quantitative regulation of FLC via coordinated transcriptional initiation and elongation. Proc Natl Acad Sci U S A 113:218-223. Wu, Z., Zhu, D., Lin, X., Miao, J., Gu, L., Deng, X., Yang, Q., Sun, K., Zhu, D., Cao, X., et al. (2016b). RNA Binding Proteins RZ-1B and RZ-1C Play Critical Roles in Regulating Pre-mRNA Splicing and Gene Expression during Development in Arabidopsis. Plant Cell 28:55-73. Wuarin, J., and Schibler, U. (1994). Physical isolation of nascent RNA chains transcribed by RNA polymerase II: evidence for cotranscriptional splicing. Mol Cell Biol 14:7219-7225. Xiao, R., Chen, J.Y., Liang, Z., Luo, D., Chen, G., Lu, Z.J., Chen, Y., Zhou, B., Li, H., Du, X., et al. (2019). Pervasive Chromatin-RNA Binding Protein Interactions Enable RNA-Based Regulation of Transcription. Cell 178:107-121 e118. Xing, D., Wang, Y., Hamilton, M., Ben-Hur, A., and Reddy, A.S. (2015). Transcriptome-wide identification of RNA targets of Arabidopsis SERINE/ARGININE-RICH45 uncovers the unexpected roles of this RNA-binding protein in RNA processing. Plant Cell 27:3294-3308. Zhang, G., Taneja, K.L., Singer, R.H., and Green, M.R. (1994). Localization of pre-mRNA splicing in mammalian nuclei. Nature 372:809-812. Zhang, Y., Gu, L., Hou, Y., Wang, L., Deng, X., Hang, R., Chen, D., Zhang, X., Zhang, Y., Liu, C., et al. (2015). Integrative genome-wide analysis reveals HLP1, a novel RNA-binding protein, regulates plant flowering by targeting alternative polyadenylation. Cell Res 25:864-876. Zhu, J., Liu, M., Liu, X., and Dong, Z. (2018). RNA polymerase II activity revealed by GRO-seq and pNET-seq in Arabidopsis. Nat Plants 4:1112-1123.
909 910
Figure Legends
911
Fig.1 Widespread CTS in Arabidopsis seedlings. A. A cartoon illustrating the principle of
912
CB-RNA extraction. TSS: Transcription starting site; PAS: Polyadenylation site. B. A
913
diagram describing the two fractions of CB-RNAs. The pattern of elongating RNA ( )
914
along the gene is determined by transcription initiation rate () and elongation rate (); the
915
level s () =
91:
not been released ( ) gives a flat pattern along the gene and can be expressed as
917
= ⁄ , where is the release rate of RNA. C. A diagram illustrating
918
co-transcriptional splicing (upper panel) and the expected pattern of CB-RNA-seq across a
919
gene with multiple introns (lower panel). For b and c, yellow cycle and blue lines indicate Pol
920
II and nascent RNA respectively.
() = (
). RNA that has finished transcription but
D and E. Visualization of CB-RNA-seq and mRNA-seq 33
921
results for three selected examples. A 5ʹ to 3ʹ declining slope is seen in the CB-RNA-seq
922
(Blue) but not in the mRNA-seq profile (Red). The gene structure is indicated at the bottom
923
as grey bars. F. Meta-gene analysis indicating widespread CTS, the figure shows the
924
CB-RNA-seq and mRNA-seq reads coverage along exon-intron-exon features. G. A diagram
925
showing the definition of 5ʹSS and 3ʹSS ratios. H. Mean and distribution of 5ʹSS and 3ʹSS
92:
ratios in Arabidopsis seedlings (Col-0). For all the box plot in this study, the box plot shows
927
the median, the 25th and the 75th percentiles, and sample size was indicated in case it is less
928
than 1000.
929 930
Fig. 2 CTS efficiency is correlated with gene expression level and intron or exon
931
numbers, independent of gene length.
932
A. The relationship between CTS efficiency and gene expression level. Genes were divided
933
into five groups according to their expression level (measured as Fragments Per Kilobase of
934
transcript per Million mapped reads, FPKM) from highest to lowest at 20 % intervals along
935
the X-axis, the corresponding 5ʹSS and 3ʹSS values are shown on the Y-axis. B. The
93:
relationship between CTS efficiency and gene length. Genes were divided into five groups
937
according to their length from highest to lowest at 20 % intervals along the X-axis, the
938
corresponding 5ʹSS and 3ʹSS value are shown on the Y-axis. C. The relationship between
939
CTS efficiency and intron numbers. Gene groups with different intron numbers are shown
940
along the X-axis. D. Scatter plots indicating the correlation between Gene length and intron
941
numbers. The intensity of blue colour indicates the density of datapoints. Black dots indicate
942
outliers. E. The relationship between CTS efficiency and intron number with a fixed gene
943
length. The panel is presented as in C except that genes with similar lengths were analysed. F.
944
The relationship between CTS efficiency and gene length with fixed intron numbers. The
945
panel is presented as in B except that genes with the same number of introns were analysed. 34
94:
Total sample size of 1, 2, 3 intron number is 1453, 1540 and 1445, respectively. G. The
947
relationship between exon length with the CTS efficiency of the adjacent intron. Exons were
948
divided into five different groups at 20 % intervals according to their length and displayed
949
along the X-axis. H. The relationship between intron length and CTS efficiency itself. Introns
950
were divided into five different groups at 20 % intervals according to their length and
951
displayed along the X-axis. The layout is the same as in G, except that introns not exons were
952
analysed. Sample size was indicated in case it is less than 1000.
953 954
Fig. 3 Relationships between different histone modifications and CTS efficiency.
955
Levels (Y-axis) of H3K4me3 (left panels), H3K9ac (middle panels) and H3K27me3 (right
95:
panels) that correspond to different gene features (X-axis) are indicated. Different line colours
957
indicate the groups of introns classified based on their average 5ʹSS (top panels) or 3ʹSS
958
(bottom panels) ratios from highest to lowest with 20 % intervals between two adjacent groups.
959
Note that CTS efficiency negatively correlates with H3K4me3 and H3K9ac levels, while it has
9:0
no correlation with the level of H3K27me3.
9:1 9:2
Fig. 4 Alternative splicing events are often determined co-transcriptionally.
9:3
A. A diagram illustrating the different alternative splicing events analysed. B. To quantify a
9:4
certain alternative splicing event, the Percent Spliced in Index (PSI, see Methods) was
9:5
calculated for mRNA-seq and CB-RNA-seq data and is shown on scatter plots. The result
9::
showed the PSI values from both data sets are highly correlated. The intensity of blue colour
9:7
indicates the density of datapoints. Black dots indicate outliers. C. Examples of alternative
9:8
splicing events that either determined co-transcriptionally (Left) or post-transcriptionally
9:9
(Right). D. A comparison of SS ratios between introns involved in alternative splicing and 35
970
other introns from the same group of genes. Alternative splicing (AS) events detected in
971
mRNA-seq data were classified into different groups according to the different AS type in A,
972
and their corresponding SS ratios were calculated and displayed in first box plot on left hand
973
side. The average (ave), minimum (min) and maximum (max) SS ratio from the same group
974
of genes were also calculated and shown along the X-axis. The number (N) of alternative
975
splicing events analysed in the assay is shown at the top of each chart. Horizontal line was
97:
added to better visualize difference of median level between box plots. p values on top of
977
each box plot were calculated over the first box plot on the left hand side based on Wilcoxon
978
test.
979 980
Fig. 5 RZ-1B and RZ-1C globally promote efficient CTS.
981
A. Distribution of introns according to their 3’SS ratios in Col-0 (WT) and in the rz-1b rz-1c
982
double mutant. B. Box-plot showing 5ʹSS and 3ʹSS values in CB-RNA of 6,119
983
exon-intron-exon units that were differentially spliced in rz-1b rz-1c compared with Col-0. C.
984
Box-plot showing 5’SS and 3’SS values in mRNA of the same set of exon-intron-exons units
985
as in B. For B and C, p values were calculated between red and green box plot based on
98:
two-tailed t-test. D. A diagram illustrating the principle of eCLIP-seq. E. Autoradiograph
987
indicating successful harvesting of RZ-1C-bound RNA after IP and denaturing gel
988
electrophoresis in the RZ-1C eCLIP experiment. The signal comes from P32-end labelling of
989
RNA. High concentration RNase treatment released the long RNAs associated with RZ-1C
990
and therefore only a band corresponding to the molecular weight of GFP-RZ-1C is observed.
991
F. Distribution of eCLIP XLS (crosslink sites) at exons, introns, 5ʹUTR and 3ʹUTR. Numbers
992
in brackets indicate the number of sites in each category. G. Overlap between genes with
993
RZ-1C-eCLIP XLS and genes with altered CTS efficiency (same set of genes as shown in B
994
and C). H and I. Two examples of genes that have both RZ-1C bound and reduced splicing 3:
995
efficiency. Sequencing results from CB-RNA, mRNA, GFP-RZ-1C-eCLIP and GFP-eCLIP
99:
are shown as different panels in H and affected introns are shown as grey boxes. qPCR
997
validation of altered splicing efficiency as well as RZ-1C RNA binding (By RIP-qPCR, see
998
method) is shown in I, primer location is indicated at the top of the bar chart. Data are
999
presented as mean ± s.e.m. (n = 3). Asterisk indicates a significant difference based on
1000
two-tailed t-test (p <0.05).
1001 1002
Fig. 6 RZ-1B and RZ-1C promote CTS through both local and global mode.
1003
A. Distribution of GFP-RZ-1C-eCLIP XLS along the gene. Different line colours indicate
1004
groups of different expression levels as judged by mRNA-seq data. B. Distribution of
1005
GFP-RZ-1C-eCLIP XLS along the exon-intron-exon structure. The number of XLS were
100:
normalized to the read density in CB-RNA-seq and shown as Log value on Y-axis. Different
1007
line colours indicate groups of different binding patterns. C. Distribution of RZ-1C bound
1008
genes based on the position of XLS at those genes. D. Box plot showing the abundance of
1009
RZ-1C bound and unbound introns according to the CB-RNA-seq. E. Box plot showing the
1010
enrichment of exon-intron junction reads and exon-exon junction reads in GFP-RZ-1C-eCLIP.
1011
Only reads with XLS at exon were calculated. At each exon-exon or exon-junction, eCLIP
1012
reads density were normalized to the reads density in CB-RNA-seq and shown as Log value
1013
on Y-axis. For D and E, p values were calculated between red and green box plot based on
1014
Wilcoxon test. F. MEME-ChIP analysis identified the top eight motifs that were enriched
1015
within +/- 10bp windows of eCLIP XLS. G. Top, a gene model illustrating the RZ-1C
101:
binding target with the splicing-affected (in the rz-1c rz-1b mutant) exon-intron-exon units
1017
shown as black boxes and the random exon-intron-exon units as grey boxes. Bottom, the
1018
black bars summarize the GFP-RZ-1C-eCLIP XLS distribution in the splicing-affected units;
1019
while the grey bars summarize the GFP-RZ-1C-eCLIP XLS distribution in units that were 37
1020
randomly taken from other regions of the same genes. The results are summarized by using
1021
different types of eCLIP reads displayed along the X-axis. p values were calculated by exact
1022
Fisher test. H. A gene model emphasising the splicing-affected (in the rz-1c rz-1b mutant)
1023
exon-intron-exon units with or without RZ-1C binding. The black bar indicates the ratio of
1024
the splicing-affected unit with RZ-1C binding, while the grey bar indicates the
1025
splicing-affected unit without RZ-1C binding. The results are summarized by using different
102:
types of eCLIP reads displayed along the X-axis. I. Distribution of genes that less effectively
1027
spliced in rz-1b rz-1c (Red) according to their intron numbers. Distribution of the rest of
1028
genes in genome was labelled in grey. J. A cartoon illustrating the proposed working model
1029
of RZ-1B and RZ-1C in promoting CTS. In the local mode, RZ-1C promotes splicing through
1030
binding to the affected intron or exon adjacent to it. In the global mode, RZ-1C promotes
1031
splicing through binding to exons which could be either proximal or distal to the affected
1032
intron. Pol II, RNA polymerase II.
1033
38
A
B
C
CB-RNAs = RNA e + RNA f
Free proteins &RNAs (Removed) 5’ 3’ TSS position X PAS Elongation RNA (RNA e)
Nuclei
5’ TSS
PAS
Chromatin-Bound RNAs (CB-RNAs)
D 5’
3’
5’ 3’ TSS PAS Full length RNA (RNA f)
5’
E
26 kb
CB-RNA
3’
5’
3’
Exon
Exon
Exon
5’
3’
At1g03060.1
At1g67140.3
G
H
5’ exonic reads (25bp) 5’ SS Exon
0.005
5’ intronic reads
0
5’ SS ratio =
exon
Figure 1
intron
exon
intron
exon
3’ SS ratio =
3’ exonic reads 3’ SS
Exon
3’ intronic reads 5’ intronic reads 5’ exonic reads 3’ intronic reads 3’ exonic reads
0.0 0.2 0.4 0.6 0.8 1.0
CB−seq mRNA−seq
0.01
reads coverage
0.015
F
SS ratio
At1g03070.2
Exon
18 kb
mRNA
At1g03080.1
3’
Exon
CB-RNAs
Cell
Exon
5’
Col-0
3’ SS
5’ SS
3’
3.0
r = 0.81
2.5
Gene Length (kb)
1.0 0.8
2.0 1.5 1.0
291
522
0.5 0.0
110
119 900
1.0 0.8 0.6
87
0.2 0
Longest >5
Intron number: Three 3’ SS 5’ SS
Shortest Gene Length
Gene Exon Length
H
SS ratio
3’ SS 5’ SS
0.2
0.4
0
0
Longest
Shortest
1.0
2 3 4 5 Intron Number
0.8
1
0.2
0.4 0.2 0
Figure 2
73
0.6
0.8 0.6
0.6 0.4 0.2 0
Shortest
3’ SS 5’ SS
0.6
Intron number: Two
80
0.4
>5
60
65
0.4
2 3 4 5 Intron Number
0.8
Intron number: One
1.0
>5
G
0
0
0
2 3 4 5 Intron Number
1
40
Intron Number
0.2
1
20
0.4
379
0
>5
SS ratio
1.0 0.8
346
0.4
0.4
3
Gene Length
2 3 4 5 Intron Number
Gene length: 3+/- 0.2 kb 3’ SS 5’ SS
0.6
0.8
282
1.0
1.0
273
0.6
0.6
23
0.2
SS ratio
89
Longest
1
Gene length: 2+/- 0.2 kb
0.2
215
0.8
1.0
274
0.8
1.0
0.6
SS ratio
0
Shortest Gene Length
Gene length: 1+/- 0.2 kb
1
SS ratio
D
0.2
0.2
Longest
Lowest Gene FPKM
F
3’SS 5’SS
0.4
1.0 0.8 0.6
SS ratio
0.4
0.6 0.7
Highest
E
C
3’ SS 5’ SS
0
SS ratio
B
3’ SS 5’ SS
0 0.1 0.2 0.3 0.4 0.5
A
Longest
Shortest Gene Length
Longest
Shortest
Gene Intron Length
0.015 0.005
Highest Lowest
Intron
Exon
H3K27me3 Exon
3’ SS
Intron
Exon
Highest Lowest
Exon
Intron
H3K27me3
0
H3K9ac
0
0
H3K4me3
Lowest
0
Exon
0.01
0.01
0.02
Lowest
3’ SS
Intron
0.015
Exon
0.01
Highest
Exon
Figure 3
H3K9ac
0
Exon
Highest
0.01
0.03
Lowest
5’ SS
0.005
Intron
0.03
3’ SS
0.02
0.03
Highest
0.01
0.01
H3K4me3 Exon
Histone modification level
5’ SS
0.02
Lowest
0.02
0.03
Highest
0
Histone modification level
5’ SS
Exon
Exon
Intron
Exon
A
5’ SS
B
3’ SS
Exon
Exon
Exon
1 0.75
Exon Skipping (ES)
A3SS N=1059 r=0.87
A5SS N=597 r=0.9
ES N=489 r=0.89
IR N=887 r=0.8
0.5
mRNA-seq
Exon
Exon
Alternative 5’ splicing site (A5SS) Exon
Exon
Exon
Aternative 3’ splicing site (A3SS) Exon
Exon
Exon
0.25 0 1 0.75 0.5
Intron retention (IR)
0.25
Exon
Exon
0 0
0.25 0.5 0.75
C 1422 bp
5’
1 0
0.25
CB- RNA seq
3’
0.5 0.75
1
1200 bp
5’
3’
At5g53860.1
At4g25500.1
At5g53860.4
At4g25500.3
At5g53860.3
At4g25500.4
At5g53860.2
At4g25500.2
At5g53860.5
CB-RNA
Psi = 0.86
Psi = 0.09
mRNA
Psi = 0.93
Psi = 0.01
Co-transcriptional decision
Psi = 0.75
CB-RNA
Psi = 0.22
mRNA
Post-transcriptional decision
D N=397 1
3’ss 5’ss
p<2.2 e-16 p=1.9 e-12
N=411 3’ss 5’ss
p=7.6 e-16 p<2.2 e-16
N=394
N=425
3’ss 5’ss
p<2.2 e-16 p<2.2 e-16
3’ss 5’ss
p=0.015 p=0.0006
SS ratio
0.75 p=1.7 e-5 p=5.4 e-7
0.5
p=1.9 e-7
p<2.2 e-16
p<2.2 e-16
p=0.0001 p=2.1 e-7
p=0.0001
p<2.2 e-16
p<2.2 e-16 p<2.2 e-16
p<2.2 e-16 p<2.2 e-16 p<2.2 e-16
p<2.2 e-16 p<2.2 e-16
0.25
0
ES
Figure 4
ave
min
max
A3SS
ave
min
max
A5SS
ave
min
max
IR
ave
min
max
B
SS ratio in mRNA
0.6 0.4 0.2 0.0
(0 ~ (0 0.0 9 .0 9~ ] (0 0.1 9 .1 9~ ] (0 0.2 9 .2 9~ ] 0 . (0 .3 39] 9~ 0 (0 . .4 49] 9~ 0 . (0 .5 59] 9~ (0 0.6 .6 9~ 9] 0. 79 (0 .7 9~ ] 0 . (0 .8 89] 9~ 0 . (0 .9 99] 9~ m ax ]
uv-crosslink in vivo
Stringent purification of protein/RNA complex & Adaptor ligation 5’
RBP
5'ss
F
E
5'ss
3'ss
G RZ-1C Bound Genes (eCLIP)
3’-OH
Exon
P-5’
(1980453)
GFP-RZ-1C (~70 Kd)
3,408bp
genes in rz-1b rz-1c at chromatin GFP
RZ-1C GFP 3’
I
5’
Col-0 (CB-RNA)
GFP only (eCLIP)
P2
0.35
0.07
0.25 0.20 0.15 0.10 0.05 0.00
C 1c l-0 Co b rz- RZ-1 1 rz- GFP
AT4G27000
5,919bp
3’
5’ RIP: P1
RIP At4g27000
*
0.06
0.06 0.05 0.04 0.03 0.02 0.00
C 1c l-0 Co b rz- RZ-1 1 rz GFP P2
0.10
*
Figure 5
*
0.02
0.00
P1
P2
3’
0.25
RIP At1g17220 GFP only GFP-RZ-1C
0.20 IP of % input
0.06 0.04 0.02 0.00
AT1G17220
0.04
Spliced
At1g17220 Int.6 Unspliced/Spliced
GFP only (eCLIP)
*
Unspliced
0.08
RZ-1C GFP (eCLIP)
GFP only GFP-RZ-1C
0.01
Col-0 (CB-RNA)
rz-1b rz-1c (CB-RNA)
3’
Unspliced
P1
*
0.30 At4g27000 Int.2 Unspliced/Spliced
RZ-1C GFP (eCLIP)
Spliced RIP:
rz-1b rz-1c (CB-RNA)
5’
474
Differentiall spliced
IP of % input
5’
Intron (90257)
cDNA adaptor ligation & PCR Sequencing to pinpoint the crosslink site
2572
3'UTR (94026)
At4g27000 Int.1 Unspliced/Spliced
5’ cDNA
8365
5'UTR (130762)
3’ RNA
Reverse Transcription stops at the peptide
H
p = 0.50
Low RNase I High RNase I (1:700) (1:25)
Proteinase K digestion leaves a polypeptide at the crosslink site
3’
p = 0.12
3'ss
AAA
5’
Col-0 rz-1b rz-1c
0.00 0.02 0.04 0.06 0.08
p = 8.0 e-8
0.8
SS ratio in CB-RNA
p < 2.2 e-16
0
3’ SS ratio
D
C
Col-0 rz-1b rz-1c
1.0
5000 10000 15000 20000 25000
Col-0 rz-1b rz-1c
0
Numbers of intron
A
0.15
*
0.10 0.05 0.00
Col-0 rz-1b rz-1c GFP-RZ-1C
*
P1
P2
Log2 (iCLIP/ CB-RNA) 2
0.02
31% (3370)
1%(71)
downstream exon
Bits
5
intron exon
exon intron exon
Exon-intron junction
exon
Bits
E-value: 6.8e-100
exon
intron
Exon-exon junction
Intron
exon
1 0
E-value: 8.8e-067
2 1 0 2
E-value: 3.9e-064
1 0
E-value: 2.6e-056
RZ-1
RZ-1
1.2
With RZ-1C binding in splicing affected unit Without RZ-1C binding in splicing affected unit
1
0.4
Exon1 Exon2
0.6
Exon2
27%
0.8
Exon1
Ratio of units showed differential splicing in rz-1c rz-1b
p = 4.3e-22
p = 0.17
Total
1 0
E-value: 1.0e-083
2
Splicing affected unit
RZ-1
RZ-1
0.3
0.2 0
exon
Exon-intron junction
Exon
Intron
Exon
Exon-exon junction
Total
J
0.125
Less effectively spliced genes in rz-1b rz-1c
Pol II
Other genes
Globle
Local
R
S
-1
0.100
RZ
0.2
p = 0.0004
p = 3.2e-6
0.1
SR
0.075
0.050
RZ-1
-1
SR
RZ
SR
RZ-1
SR
RZ
-1
0.025
SR
0
10
20 30 Intron Number
40
50
SR
1
0.000
RZ-
0.0
p = 3.5e-5 p = 0.007
RZ-1
0.5 0.4
p = 0.003
exon
E-value: 2.5e-109
H
Random unit
0.6
RZ-1
Exon-intron junction
Splicing affected unit Random unit p = 0.0003
p = 2.4e-8
Bits
Bits
Exon-exon junction
p = 4.0e-5
E-value: 6.8e-164
1 0
Bits
Introns that don’t bound by RZ-1C
1 0
E-value: 4.0e-428
2 1
2
Splicing affected unit
p = 0.002
1 0
0 2
0
G
2
2
Exon2
p = 0.77
Introns that bound by RZ-1C
Ratio of units showed RZ-1 binding
Bits
10
Log2 (iCLIP/ CB-RNA)
p < 2.2e-16
−5
CB-RNA reads coverage
0.00 0.05 0.10 0.15 0.20 0.25 0.30
F
Bits
E
Bits
intron
Exon1
upstream exon
1KB
D
Ratio of gene
Genes with XLS in exons only Genes with XLS in introns only Genes with XLS in both introns and exons
68% (7496)
RZ-1
PAS
Exon2
TSS
Exon1
-1KB
I
Numbers of gene bound by RZ-1C
0
0.01
C
Units with intronic XLS Units with only exonic XLS All RZ-1C bound Units
4
B
Gene FPKM
0
Reads coverage
0.03
A
5’
3’
Exon
RZ-1B RZ-1C SR proteins
Globle splicing regulation Local splicing regulation Splicing unaffected intron Splicing affected intron
Figure 6 .