The Features and Regulation of Co-transcriptional Splicing in Arabidopsis

The Features and Regulation of Co-transcriptional Splicing in Arabidopsis

Journal Pre-proof The features and regulation of co-transcriptional splicing in Arabidopsis Danling Zhu, Fei Mao, Yuanchun Tian, Xiaoya Lin, Lianfeng ...

6MB Sizes 0 Downloads 60 Views

Journal Pre-proof The features and regulation of co-transcriptional splicing in Arabidopsis Danling Zhu, Fei Mao, Yuanchun Tian, Xiaoya Lin, Lianfeng Gu, Hongya Gu, Li-jia Qu, Yufeng Wu, Zhe Wu

PII: DOI: Reference:

S1674-2052(19)30368-5 https://doi.org/10.1016/j.molp.2019.11.004 MOLP 851

To appear in: MOLECULAR PLANT Accepted Date: 15 November 2019

Please cite this article as: Zhu D., Mao F., Tian Y., Lin X., Gu L., Gu H., Qu L.-j., Wu Y., and Wu Z. (2019). The features and regulation of co-transcriptional splicing in Arabidopsis. Mol. Plant. doi: https:// doi.org/10.1016/j.molp.2019.11.004. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. All studies published in MOLECULAR PLANT are embargoed until 3PM ET of the day they are published as corrected proofs on-line. Studies cannot be publicized as accepted manuscripts or uncorrected proofs. © 2019 The Author

1

Title: The features and regulation of co-transcriptional splicing in Arabidopsis

2

Authors: Danling Zhu1,†, Fei Mao2,†, Yuanchun Tian2, Xiaoya Lin3,5, Lianfeng Gu4, Hongya

3

Gu3, Li-jia Qu3, Yufeng Wu2,* and Zhe Wu1,*

4

1

5

Science and Technology, Shenzhen, China, 518055.



2

7

Center, Nanjing Agriculture University, Nanjing, Jiangsu 210095, China.

8

3

9

Sciences, College of Life Sciences, Peking University, Beijing, China, 100871.

Institute of Plant and Food Research, Department of Biology, Southern University of

State Key Laboratory for Crop Genetics and Germplasm Enhancement, Bioinformatics

State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life

10

4

11

Forestry University, Fuzhou, China, 350002.

12

5

13

510006.

14

†These authors contributed equally.

15

*Co-corresponding authors: [email protected] and [email protected]

Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and

Current address: School of Life Sciences, Guangzhou University, Guangzhou, China.

1:

17

Running title

18

Short Summary:

CB-RNA-seq and eCLIP-seq reveal features of CTS in plants.

1

19

Splicing and transcription are often coupled as demonstrated in various oganisms. However,

20

plant genes are much shorter than mammalian genes, questioning how frequent plant genes

21

can be spliced during transcription. Through developing chromatin-bound RNA-seq and

22

eCLIP-seq, we studied the features and regulation of co-transcriptional splicing, involving

23

direct RNA-binding of an hnRNP like protein, RZ-1C.

24 25

ABSTRACT

2:

Precursor mRNA (pre-mRNA) splicing is essential for gene expression in most eukaryotic

27

organisms. Previous studies from mammals, Drosophila and yeast show that the majority of

28

splicing

29

co-transcriptional splicing (CTS) and its regulation are still largely unknown. Here, we used

30

chromatin-bound RNA sequencing (CB-RNA-seq), to study CTS in Arabidopsis thaliana.

31

CTS was widespread in Arabidopsis seedlings, with a large proportion of alternative splicing

32

events determined co-transcriptionally. CTS efficiency correlated with gene expression level,

33

the chromatin landscape and, most surprisingly, the number of introns and exons of

34

individual genes, but was independent of gene length. In combination with eCLIP-seq

35

analysis, our results showed that the hnRNP-like proteins RZ-1B and RZ-1C promote

3:

efficient CTS globally through direct binding, frequently to exonic sequences. Notably, this

37

general effect of RZ-1B/1C on splicing promotion was mainly at the chromatin level, not at

38

the mRNA level. RZ-1C promotes CTS of multiple exons genes in association with its

39

binding to the regions both proximal and distal to the regulated introns. We propose that

40

RZ-1C promotes efficient CTS of genes with multiple exons involving cooperative

41

interactions with many exons, introns and splicing factors. Our work thus reveals important

events

occurs

co-transcriptionally.

2

In

plants,

however,

the

nature

of

42

features of CTS in plants and provides methodologies for the investigation of CTS and

43

RNA-binding proteins in plants.

44 45

Introduction:

4:

Pre-mRNA splicing is carried out by the spliceosome, a megadalton complex that triggers the

47

precise excision of introns from the pre-mRNA (Jangi and Sharp, 2014; Reddy et al., 2013).

48

Early studies of individual genes implied that splicing could already occur during

49

transcription, a process termed CTS (Bauren and Wieslander, 1994; Beyer and Osheim, 1988;

50

Osheim et al., 1985; Zhang et al., 1994). The potential for CTS is not surprising in organisms

51

with relatively long genes (e.g. Human). Since the time needed for splicing (a few seconds to

52

3 min) is much shorter than the time needed for transcribing a gene (often more than 10mins

53

for human genes)(Alexander et al., 2010; Bentley, 2014; Beyer and Osheim, 1988; Huranova

54

et al., 2010; Singh and Padgett, 2009). However, recent genome-wide profiling of nascent

55

RNA has revealed that CTS is the predominant mode of splicing in different species

5:

including budding yeast (Ameur et al., 2011; Bhatt et al., 2012; Khodor et al., 2011; Khodor

57

et al., 2012; Tilgner et al., 2012; Oesterreich et al., 2016), an organism with average gene

58

length of 1-2kb, indicating rapid CTS. Indeed, in budding yeast, the catalysis of splicing

59

occurs on average 45bp after the Pol II finish transcribing the 3’ intron acceptor site, much

:0

faster than previously anticipated (Oesterreich et al., 2016). Interestingly, in contrast, two

:1

recent reports showed the catalysis of splicing in human cell often occurs when Pol II travels

:2

to more than a kilobases downstream of intron acceptor site (Drexler et al., 2019; Nojima et

:3

al., 2018), indicating significantly differed regulations between species.

:4 3

:5

Despite their tight coupling in vivo, transcription and splicing can both happen independent

::

of each other in vitro. How and why these two processes are so tightly coupled in-vivo has

:7

been the focus of much effort in yeast and mammalian cells. The extent of CTS correlates

:8

with different gene features and varies between organisms, tissues and individual genes,

:9

indicating great regulatory potentials (Brugiolo et al., 2013; Khodor et al., 2012). In particular,

70

the transcription elongation rate, which is regulated by trans factors and chromatin structure,

71

can influence splicing site choice (Bentley, 2014; de la Mata et al., 2003; Kornblihtt, 2007;

72

Naftelberg et al., 2015; Roberts et al., 1998). Faster elongation rate could reduce the chance

73

of weak, promoter proximal splicing sites being utilized, influencing the production of certain

74

mRNA isoforms (Bentley, 2014; de la Mata et al., 2003; Kornblihtt, 2007; Naftelberg et al.,

75

2015; Roberts et al., 1998). This is known as “kinetic coupling”, a mechanism has been

7:

demonstrated in several different organisms (Bentley, 2014; de la Mata et al., 2003;

77

Kornblihtt, 2007; Naftelberg et al., 2015; Roberts et al., 1998), and is involved in flowering

78

time control (Marquardt et al., 2014) and light response (Godoy Herz et al., 2019) in

79

Arabidopsis. In addition, a study in budding yeast showed that slower or faster elongation

80

could also affect the fidelity of splicing in additional to its efficiency, indicating that an

81

optimal elongation rate has likely been selected for in-vivo (Aslanzadeh et al., 2018). Notably,

82

recent works in mammalian cells showed the spliceosome is complexed with Pol II CTD

83

phosphorylated on the serine 5 (S5P) but not Pol II CTD phosphorylated on the serine 2

84

(S2P), suggesting a physical link between splicing and transcription through phosphorylation

85

of Pol II CTD (Nojima et al., 2015, Nojima et al., 2018). In addition, splicing and

8:

transcription can also be coupled by splicing factors that in turn regulate transcription (Das et

87

al., 2007; de la Mata and Kornblihtt, 2006; Ji et al., 2013; Monsalve et al., 2000). The

88

coupling between splicing and transcription is likely also important for RNA quality control.

89

For example, the deposition of exon-junction complex, a complex important for none sense 4

90

mediated decay of RNA, is largely splicing dependent and co-transcriptional (Alexandrov et

91

al., 2012; Barbosa et al., 2012; Singh et al., 2012; Steckelberg et al., 2012).

92 93

Despite its importance, relatively little is known about the coupling between transcription and

94

splicing in plants. In Arabidopsis, CTS has been observed at individual loci such as FLC and

95

DOG1 (Dolata et al., 2015; Rosa et al., 2016; Wu et al., 2016a). For example, single molecule

9:

RNA FISH (smFISH) results show only two dots for unspliced FLC RNA within the nucleus,

97

and these overlap with the FLC DNA FISH signal, indicating efficient co-transcriptional

98

splicing (Rosa et al., 2016). In addition, pNET-seq analysis in Arabidopsis also found

99

enrichment of reads with 3’end mapped at exon 3’ end (most likely splicing intermediate)

100

specifically for the Pol II CTD S5P (Zhu et al., 2018), consistent with the idea of CTS.

101

Notably, typical plant (e.g. Arabidopsis thaliana) genes are structurally similar to yeast genes

102

with an average length of 2.4 kb (Reddy, 2007), much shorter than those of mammals, which

103

should theoretically reduce the chance for CTS. Moreover, the majority of alternative splicing

104

events in plants are intron retention, in sharp contrast to the situation in mammals (Reddy,

105

2007). To date, the extent of CTS as well as its features and regulations at whole

10:

genome-level is unknown in plants.

107 108

RNA-binding proteins represent major regulators in determine the fate of RNAs including

109

splicing, either co- or post-transcriptionally. Arabidopsis genome encode at least 500 putative

110

RNA-binding proteins, the majority of which are functionally unknown (Marondedze et al.,

111

2016; Reichel et al., 2016). Indeed, methodology for study RNA-binding proteins such as

112

CLIP-seq was only available recently and remains technically challenging for plant samples

113

(Meyer et al., 2017; Zhang et al., 2015). A relatively well studied class of RNA-binding 5

114

proteins in plant are SR proteins, a group of proteins that were extensively linked with

115

alternative splicing (Reddy and Shad Ali, 2011), presumably through their interactions with

11:

sequence motifs that located in exons or introns (Jeong, 2017). Mammalian SR proteins also

117

regulate Pol II 5’ end pausing and elongation (Ji et al., 2013; Lin et al., 2008; Xiao et al.,

118

2019), suggesting their important roles at co-transcriptional level. We reported previously

119

that a small group of proteins are in close association with SR proteins in Arabidopsis,

120

namely RZ-1B and RZ-1C (Wu et al., 2016b). Both proteins feature a N-terminal RRM

121

domain, a C-terminal low complexity region and a zinc-finger motif in the middle (Lorkovic

122

and Barta, 2002). RZ-1s represent hnRNPs (Hanano et al., 1996; Lorkovic and Barta, 2002)

123

and have potential to interact with multiple SR proteins (Wu et al., 2016b). RZ-1b and RZ-1c

124

have redundant and essential function in plant development and regulate alternative splicing

125

of more than 100 genes (Wu et al., 2016b). Notably, RZ-1C tightly associates with chromatin

12:

and could affect transcription in addition to splicing at individual genes such as FLC,

127

suggesting a possible role in co-transcriptional regulation (Wu et al., 2016b).

128 129

Here we studied the features and regulation of CTS in plants through adapting

130

chromatin-bound RNA-seq in Arabidopsis. CTS is widespread in Arabidopsis seedlings. The

131

efficiency of CTS correlate with gene expression level, the distance from intron 3’end to gene

132

3’end and also histone modifications, consistent with previous reports in yeast and animals,

133

and support the idea of kinetic coupling. As a novel feature of CTS in plant, we found the

134

average CTS efficiency of genes correlated with their intron and or exon numbers while

135

largely independent of gene length. We further explored roles of RZ-1B and RZ-1C in CTS

13:

regulation. RZ-1B and RZ-1C function to enhance CTS globally. The CTS promotion by

137

RZ-1C is in association with its direct binding to RNA targets, as revealed through

138

eCLIP-seq, an efficient and powerful technique we adapted for studying the direct :

139

RNA-binding of proteins in plants. Our work uncovers general features of CTS in plants and

140

highlights the role exon/intron numbers as well as RZ-1B/1C in CTS regulation.

141 142

Results

143

Widespread CTS as revealed by CB-RNA-seq in Arabidopsis seedlings.

144

To test the universality of CTS in plants, we adapted the CB-RNA-seq method for use in

145

Arabidopsis. CB-RNA was extracted from isolated nuclei followed by a stringent urea wash

14:

to release non-chromatin associated protein and RNAs, a method that has been widely used in

147

both mammalian and yeast systems (Bhatt et al., 2012; Khodor et al., 2011; Oesterreich et al.,

148

2016; Wuarin and Schibler, 1994) (Fig.1A; Supplementary Fig. 1A and B). We used the ratio

149

of unspliced RNA to spliced RNA at several different genes as an estimate of enrichment of

150

nascent RNA in our assay. This include FLC, a gene that showed tight co-transcriptional

151

splicing (Rosa et al., 2016). Given efficient removal of intron at FLC chromatin, one would

152

expect the highest enrichment of unspliced RNA at chromatin compared with nucleus and

153

cytosol. Indeed, this is the case at FLC and other two genes we tested, indicating the

154

successful enrichment of chromatin-bound RNA (Supplementary Fig. 1C to E). The

155

CB-RNA fraction was treated to remove the ribosomal RNA and any contaminating poly (A)

15:

RNA and then used for Illumina RNA-seq with strand specificity.

157 158

For a given locus, the CB-RNA reflects the sum of two proportions of RNA molecules (Fig.

159

1B) (Ietswaart et al., 2017; Wu et al., 2016a): RNAs being transcribing at the locus (RNAe,

1:0

elongating RNA) and RNAs that have been fully transcribed but not yet released from the

1:1

chromatin (RNAf, full-length RNA). The latter fraction exists as RNA polymerase II (Pol II) 7

1:2

generally pauses at the 3ʹ end of the gene before completion of termination(Zhu et al., 2018).

1:3

Assuming that RNAe dominates the CB-RNA fraction, we expect a 5ʹ to 3ʹ decrease in the

1:4

amount of CB-RNA (Fig. 1B). Furthermore, assuming that introns are removed from the

1:5

pre-mRNA co-transcriptionally, the reads coverage at the intronic region should be

1::

significantly lower than at the adjacent exonic regions, resulting in a sawtooth-like pattern

1:7

(Fig. 1C).

1:8 1:9

As expected, we observed a declining pattern from the 5ʹ to the 3ʹ end at many individual loci

170

in our CB-RNA-seq data, but not in the mRNA-seq data (Fig.1D and E; Supplementary Fig.

171

1F), indicating successful isolation of CB-RNA. We note that this declining pattern was

172

observed more often in long genes compared with short ones, possibly due to the longer time

173

needed by Pol II for elongation at these genes, resulting in the RNAe dominating the locus

174

over the RNAf. Moreover, the sawtooth pattern of CB-RNA was observed at individual genes

175

as well as globally, indicating general CTS (Fig.1D to F; Supplementary Fig. 1F). Given

17:

good read coverage of introns as well as exons, and also good correlation between biological

177

repeats (Supplementary Fig. 1G, Supplementary Table 1), we then quantified the CTS

178

efficiency throughout the Arabidopsis genome using widely accepted standards (Herzel and

179

Neugebauer, 2015; Khodor et al., 2011; Khodor et al., 2012). At each intron–exon boundary,

180

we calculated the ratio of intronic reads over the adjacent exonic reads to yield a 5ʹ splicing

181

site (5ʹSS) ratio and a 3ʹ splicing site (3ʹSS) ratio (Fig. 1G). The median value for both 5ʹSS

182

(88,364 introns) and 3ʹSS (89,231 introns) ratios was around 0.2 (Fig. 1H), similar to that

183

found in Drosophila (Khodor et al., 2011), but lower than that in mouse (Khodor et al., 2012).

184

Therefore, we conclude that Arabidopsis pre-mRNAs are generally co-transcriptionally

185

spliced, as in yeast and animals.

18: 8

187

CTS efficiency correlated with gene expression level.

188

Accumulating evidence indicates that transcription and splicing are coupled, i.e., Pol II

189

transcription can help determine splice-site choice and efficient assembly of an active

190

spliceosome can change the extent of CTS (Godoy Herz et al., 2019; Naftelberg et al., 2015).

191

We therefore determined whether CTS efficiency was related to the level of gene expression.

192

For high-confidence analysis, we excluded data with a 5ʹSS or 3ʹSS ratio equal to or more

193

than one. In these cases, reads were often indistinguishable between an intron or alternative

194

exon in the same region. The rest of the data, i.e., 5ʹSS and 3ʹSS ratios distributed between 0

195

and 1, were plotted against the corresponding gene expression level (as determined by the

19:

mRNA-seq data). Both the 5ʹSS and 3ʹSS ratios negatively correlated with the level of gene

197

expression (Fig. 2A; Supplementary Fig. 2), suggesting that the pre-mRNAs of highly

198

expressed genes were processed at a higher rate than those of weakly expressed genes.

199 200

CTS efficiency correlated with exon and or intron numbers.

201

In mouse and Drosophila, gene structure influences splicing efficiency (Khodor et al., 2011;

202

Khodor et al., 2012). Therefore, we plotted the 5ʹSS and 3ʹSS ratios together with different

203

gene features. The first and last introns were often less efficiently removed at the

204

co-transcriptional level compared with the internal introns (Supplementary Fig. 3A).

205

Furthermore, the longer the distance between the intron 3ʹ end and the gene 3ʹ end, the more

20:

efficient the CTS (Supplementary Fig. 3B). This suggests sufficient co-transcriptional time is

207

beneficial to efficient CTS, consistent with previously reported in other organisms (Khodor et

208

al., 2011; Khodor et al., 2012). We observed that both gene length and number of

209

introns/exons negatively correlated with the average 5ʹSS and 3ʹSS ratios (Fig. 2B and C;

210

Supplementary Fig. 4A). Since intron number and gene length correlate with each other (Fig.

211

2D), we wondered which one was more relevant to the regulation of CTS efficiency. 9

212

Surprisingly, the negative correlation between intron number and 5ʹSS and 3ʹSS ratio

213

remained, even for gene groups with fixed lengths (Fig. 2E; Supplementary Fig. 4B). By

214

contrast, no obvious tendency was found between gene length and 5ʹSS or 3ʹSS ratio among

215

genes with a fixed number of introns (Fig. 2F; Supplementary Fig. 4C and D ). We also used

21:

a general linear model to statistically assess the effect of intron number or gene length on

217

CTS efficiency (see Methods). The ANOVA test indicated that intron number was a

218

significant factor (p < 2.2e-16), rather than gene length (p = 0.88), or the interaction between

219

intron number and gene length (p = 0.84). Furthermore, intron number did not correlate with

220

gene expression level (Supplementary Fig. 4E), excluding the involvement of intron number

221

in the regulation of CTS via effects on gene expression. Unlike the case in animals, there was

222

no obvious tendency between exon or intron length with the splicing efficiency of its own or

223

its adjacent exon (Fig. 2G and H). Overall, we conclude that the number of introns and or

224

exons was an important contributor to efficient CTS independent of the gene length.

225 22:

CTS efficiency negatively correlated with certain histone marks.

227

Transcription and CTS both occur within the context of chromatin in vivo and therefore are

228

interlinked (Aslanzadeh et al., 2018; de Almeida and Carmo-Fonseca, 2014; Ullah et al.,

229

2018). We determined whether CTS efficiency is related to the level of certain histone marks

230

within the chromatin, i.e. H3K4me3, H3K9ac and H3K27me3, for which data from the same

231

seedling stage are available in public databases (see Methods). There was clear correlated

232

trend between the 5ʹSS or 3ʹSS ratio and the levels of the H3K4me3- or H3K9ac-marked

233

histones, respectively (Fig. 3). But none correlated trend between CTS and the level of

234

H3K27me3-marked histones was found. As previously reported and in consistent with

235

“kinetic coupling” model, higher H3K4me3 or H3K9ac levels may indicate faster elongation

10

23:

(Church and Fleming, 2018 ; Nagai et al., 2017), which could in turn reduce the time Pol II

237

spends on the chromatin, therefore reducing the efficiency of CTS.

238 239

Decision of alternative splicing are often made co-transcriptionally.

240

Alternative splicing is important regulatory step in gene expression in higher eukaryotes

241

(Jangi and Sharp, 2014; Reddy et al., 2013; Staiger and Brown, 2013). Examples in

242

mammalian cells showed that splicing of an alternative exon can happen either co or

243

post-transcriptionally (Pandya-Jones and Black, 2009; Saldi et al., 2016; Vargas et al., 2011).

244

Furthermore, it was proposed that splicing commitment but not necessarily catalysis occurs at

245

the chromatin (de la Mata et al., 2010). We therefore wondered to what extent alternative

24:

splicing is determined co-transcriptionally in plant. We focused on exon-skipping (ES),

247

alternative 5ʹ splice site (A5SS) and alternative 3ʹ splice site (A3SS) events (Fig. 4A) which

248

were annotated in TAIR10 genome and can be assessed in our Col-0 mRNA-seq data (see

249

method). In these events, featured reads spanning an exon-exon junction were used to

250

determine the relative ratio of different isoforms from the same locus, reflected as a PSI value

251

(Percentage Spliced In value; see Methods). Unlike intron retention (IR), the abundance of

252

isoforms in these events can be calculated without the confounding effects of unprocessed

253

introns in the CB-RNA. Comparing the mRNA-seq and CB-RNA-seq data, we found that the

254

PSI values from the two samples were generally correlated (Fig. 4B), indicating the decision

255

of alternative splicing are often made at chromatin. It worth to note that decision made

25:

post-transcriptionally does exist at some individual loci (Fig. 4C). In addition, compared to

257

constitutive removed introns at the same locus, introns involved in alternatively splicing were

258

generally less efficiently removed co-transcriptionally, although they were often not the least

259

efficient (Fig. 4D). Therefore, our data indicated that for alternative splicing events including

11

2:0

ES, A5SS and A3SS in Arabidopsis seedlings, the decision of alternative splicing is often

2:1

made at chromatin.

2:2 2:3

RZ-1B and RZ-1C enhance the efficiency of CTS globally.

2:4

We next tested how CTS is regulated by trans-factors. Among many candidates, we choose to

2:5

test RZ-1B and RZ-1C as a starting point, given they associate with chromatin and interact

2::

with a group of SR proteins (see introduction, Wu et al., 2016b), the later are important

2:7

splicing regulators across different species. The effect of RZ-1B and RZ-1C on CTS was

2:8

tested through CB-RNA-seq. Surprisingly, the distribution of the 5ʹSS and 3ʹSS ratios of the

2:9

mutant shifted towards a higher value and the means of the ratios increased, suggesting that

270

the RZ-1 proteins positively regulate the CTS efficiency of a large number of genes (Fig. 5A;

271

Supplementary Fig. 5A and B). Indeed, we found that 6,119 exon-intron-exon units from

272

3,046 genes were significantly differentially spliced at co-transcriptional level in the rz-1b

273

rz-1c mutant compared with Col-0 (false discovery rate (FDR) < 0.001; see Methods,

274

Supplementary Table 2; Fig. 5B). Among them, 98% of units (5971 of 6119) were less

275

efficiently spliced in rz-1b rz-1c. This general effects of RZ-1 on CTS efficiency contrast to

27:

its relatively subtle effects on splicing based on mRNA-seq. The 5’SS or 3’SS ratio, of either

277

all introns or those that were differentially spliced at chromatin, remains unchanged between

278

Col-0 and rz-1b rz-1c, based on mRNA-seq data (Fig. 5C; Supplementary Fig. 5C). Indeed,

279

by using same criteria as in CB-RNA, we found only 502 exon-intron-exon units that were

280

less efficiently spliced in rz-1b rz-1c compared with Col-0 based on mRNA-seq data

281

(Supplementary Fig. 5D, yellow cycle). These results suggest although co-transcriptional

282

splicing rate is slower in rz-1b rz-1c, the introns were generally been removed effectively at

283

mature mRNA. Taken together, the above results indicate RZ-1B and RZ-1C are generally

284

required for the efficient CTS. 12

285 28:

RZ-1C promotes CTS in association with its direct RNA-binding

287

We then searched for RNA targets bound by the RZ-1 proteins in vivo to further assess how

288

these proteins regulate CTS. Crosslinking and immunoprecipitation (CLIP) analysis has been

289

shown to be the most reliable method in faithfully detecting RNA-protein interactions,

290

however it is extremely challenging in plants (Meyer et al., 2017; Zhang et al., 2015). We

291

combined the individual-nucleotide resolution CLIP (iCLIP) protocol (Konig et al., 2010)

292

with the recently developed enhanced CLIP (eCLIP) protocol (Van Nostrand et al., 2016) and

293

adapted it for our research (see Methods, Fig. 5D). We kept the step of radio isotope labeling

294

of RNA as in the original iCLIP protocol to ensures the faithful detection of RNA that bound

295

by the RZ-1C (Fig. 5E). We then followed the eCLIP protocol for generation of sequencing

29:

library due to its higher efficiency and reliability (Van Nostrand et al., 2016). RZ-1C fused

297

with GFP and expressed in its genomic context (RZ-1C:GFP-RZ-1C) was able to

298

complement the rz-1b rz-1c mutants (Wu et al., 2016b). This transgenic line and a 35S:GFP

299

construct (as negative control) were used in our eCLIP analysis (Fig. 5E). Peak calling and

300

subtraction of any site that appeared in the 35S:GFP data was performed over biological

301

replicates (Supplementary Fig. 6A, Supplementary Table 1), resulting in the identification of

302

2,295,498 RZ-1C cross-linked sites (XLS) in the Arabidopsis genome. 1,980,453 (86.3%),

303

90,257 (3.9 %), 130,762 (5.7 %) and 94,026 (4.1 %) of these sites were located in exons,

304

introns, 5ʹUTRs and 3ʹUTRs, respectively (Fig. 5F). These XLS were distributed over 10,937

305

genes (Supplementary Table 3), suggesting that RZ-1C binds to a broad range of RNAs in

30:

vivo. Furthermore, among the above identified 3,046 genes that differentially spliced on

307

chromatin (Fig. 5B) in the rz-1b rz-1c mutant, 84 % had RZ-1C eCLIP XLS (Fig. 5G). This

308

is a significantly higher number than the percentage of total genes bound by RZ-1C (38 %,

309

hypergeometric test, p < 2.2e-16) or the percentage of expressed genes (fpkm>1) at the same 13

310

stage (55%, hypergeometric test, p < 2.2e-16). This suggests a direct link between RZ-1C and

311

the promotion of CTS at those genes. The binding of the RZ-1C protein with RNA as well as

312

its

313

immunoprecipitation-qPCR (RIP-qPCR) and CB-RNA or mRNA extraction followed by

314

qPCR analysis of splicing efficiency (Fig. 5H and I; Supplementary Fig. 6B and C). Notably,

315

in consistent with the sequencing results, the difference of splicing efficiency between WT

31:

and rz-1b rz-1c was not observed in mRNA samples in qPCR analysis (Supplementary Fig.

317

7). Overall, our eCLIP data indicated that the RZ-1C protein promote efficient CTS of several

318

thousand genes involving its direct binding with their RNAs.

effect

on

CTS

were

further

validated

on

selected

genes

through

RNA

319 320

RZ-1C binds to both introns and exons with different strength.

321

We then monitored the distribution of RZ-1C XLS across loci to further dissect the

322

mechanism of RZ-1C-regulated CTS. RZ-1C XLS was located well within the predicted

323

transcription start sites (TSS) and polyadenylation sites (PAS) (Fig. 6A). As shown in Fig. 5F,

324

the apparent enrichment of RZ-1C XLS at exons over introns lead us to ask if this could be

325

due to the more presence of exons over introns at steady state within cell. Since RZ-1C

32:

locates exclusively within nucleus and associates with chromatin (Wu et al., 2016b), the

327

CB-RNA-seq data could serve as a good input control for our eCLIP analysis. In order to

328

judge the binding strength of RZ-1C with different RNAs, we normalized the eCLIP data to

329

the sequencing reads density calculated from CB-RNA-seq data. In general, RZ-1C still

330

favors exons over introns after this normalization (Fig. 6B, Green line). Among 10937 genes

331

bound by RZ-1C, RZ-1C XLS was detected at solely exons, at both exons and introns, and at

332

solely introns at 7496, 3370 and 71 genes respectively (Fig. 6C). Importantly, RZ-1C bound

333

and unbound introns are at similar abundance in CB-RNA-seq data, indicating the absence of

334

RZ-1C XLS from 7496 genes is not due to different abundance of their introns (Fig. 6D). 14

335

Interestingly, in cases where RZ-1C binds to both intron and adjacent exon, the intronic

33:

binding was substantially stronger than the exonic binding when taking into account their

337

steady state levels at chromatin (Fig. 6B, Black line).

338

at the majority of its targets, while in cases where it does binds to intron, the binding strength

339

is often stronger, compared with the adjacent exons.

Therefore, RZ-1C bind only to exons

340 341

RZ-1C binds to unspliced exons stronger than spliced exons.

342

Given RZ-1C binds only to exons at majority of its targets, we further wondered if its binding

343

had any preference towards spliced or unspliced exons. We focused on exon located XLSs

344

from which their corresponding sequencing reads are either spanning an exon-intron junction

345

or an exon-exon junction. In these two cases, the splicing state of the corresponding RNA

34:

that bound by RZ-1C can be determined faithfully. After normalize to the reads density

347

calculated from CB-RNA-seq data at each corresponding locations, exon-intron spanning

348

RNAs bound by RZ-1C are at higher levels than exon-exon spanning RNAs (Fig. 6E). This

349

result was further supported by RIP-qPCR at individual loci. We designed primers that are

350

specific for different forms of RNA and quantified the strength of binding as a function of

351

input (chromatin RNA before the IP) level, and found that the RZ-1C binding (as a

352

percentage of input value) was significantly higher at the exon-intron junction than at either

353

the exon only or the exon–exon junction (Supplementary Fig. 8), suggesting stronger binding

354

with unspliced exons.

355 35:

U-rich and GA-rich motifs were enriched in sequences surrounding RZ-1C XLS.

357

Next, we searched for specific motifs to which RZ-1C could preferentially bind.

358

EM for Motif Elicitation (MEME, see method) analysis revealed several degenerative motifs

359

(Fig. 6F). Enrichment of U-rich motifs was most likely due to the preference of UV 15

Multiple

3:0

crosslinking at such sequences. Importantly, GA-rich motifs were also enriched, similar to

3:1

those previously identified in RZ-1C systematic evolution of ligands by exponential

3:2

enrichment (SELEX) experiments (Wu et al., 2016b), indicating that GA-rich sequences are

3:3

likely to be bound by RZ-1C with high affinity. Notably, GA-rich motifs are also bound by

3:4

other splicing regulators such as SR proteins (Clery et al., 2011; Xing et al., 2015) and are

3:5

important for splicing regulation across different species. Therefore, in case of RZ-1C, plants

3::

take advantage of similar cis-elements as in mammals to regulate co-transcriptional splicing.

3:7 3:8

RZ-1C promotes CTS through both local and global mode.

3:9

The above data suggest that RZ-1C plays a critical role in enhancing the efficiency of CTS,

370

often through binding with exonic sequences, but does not fully elucidate its mode of action.

371

We propose a model that summarizes two alternative, but not mutually exclusive, modes of

372

action for RZ-1C in regulating CTS. In the first “local” mode, RZ-1C would act as a

373

canonical splicing factor, binding cis-elements at the exon or intron to define and facilitate

374

the removal of the adjacent intron (Fig. 6J, blue arrow). In the second “global” mode, RZ-1C

375

would facilitate removal of an intron through binding to exons or introns distal to it (Fig.6J,

37:

black arrow). We have evidence for both modes based on our CB-RNA-seq and eCLIP-seq

377

data. There were a total of 2572 genes that showed both differential CTS efficiency and

378

RZ-1C binding (Fig. 5G). Within the 2572 genes, 5408 splicing-affected exon-intron-exon

379

units were identified. Compared with the RZ-1C binding in splicing-affected units to the

380

random units taken from the same gene, we found a higher percentage of RZ-1C binding to

381

splicing-affected units, supporting the “local” mode of action (Fig. 6G). Consistent with our

382

previous results, RZ-1C bound higher percentage of exons than introns within the splicing

383

affected units (Fig. 6G). In the meantime, for intronic binding, significant difference between

384

splicing affected units and the random units was also observed (Fig. 6G). Suggest the intronic 1:

385

binding is also important for the CTS promotion of RZ-1C. Further, 27 % of the 5408 units

38:

did not have any RZ-1C binding within the unit itself but still showed defective splicing (Fig.

387

6H, “Total”). In these cases, RZ-1C binding distal to the affected intron would be important

388

for its efficient removal at co-transcriptional level, supporting the “global” mode. Thus, we

389

propose a working model of RZ-1B and RZ-1C in the regulation of CTS, in which RZ-1C

390

binds to exons, and in less cases, introns, and can promote the co-transcriptional removal of

391

introns that both proximal and distal to it (Fig. 6J).

392 393

Discussion

394

Numerous work including structural studies showed that splicing is a generally conserved

395

process from yeast to human, which can happen both during or post-transcription. With our

39:

data added, current evidence indicate that co-transcriptional splicing is a predominant mode

397

in all eukaryotes tested so far. In addition, several lines of our data is in consistent with the

398

kinetic coupling model between transcription and splicing as proposed previously (see

399

introduction). The CTS efficiency correlated with the length between intron 3’end and gene

400

3’end, while it negatively correlated with active histone marks (H3K4me3 and acetylation)

401

related to rapid transcriptional elongation. Notably, recent evidence showed the Pol II and

402

splicing machinery could be also coupled through Pol II CTD phosphorylated on the serine 5

403

(S5P). Nascent elongating transcripts sequencing (NET-seq) analysis with antibody against

404

Pol II CTD S5P in mammalian cells showed enrichment of sequencing reads with their 3’ end

405

mapped precisely to the exon 3’end, indicating such reads could be derived from splicing

40:

intermediates which were not from the Pol II active center but from spliceosome that

407

associate with Pol II CTD S5P (Nojima et al., 2015; Nojima et al., 2018).

408

17

409

How could kinetic coupling comply with the physical coupling regulated by dynamic

410

phosphorylations of Pol II CTD is an interesting question to be addressed in the future.

411

Related to this, we looked at how CTS efficiency was related to the Pol II levels as judged by

412

the high resolution pNET-seq data that published previously (Zhu et al., 2018). Interestingly,

413

a positively correlated trend was observed between 5’SS or 3’ SS ratio and the steady state

414

levels of Pol II that carries CTD S5P and or CTD S2P modifications (Supplementary Fig. 9).

415

This data is consistent with the observed correlation between SS ratio and active histone

41:

marks, given H3K4me3 and acetylation also indicate frequent Pol II initiation in addition to

417

fast elongation (Wozniak and Strahl, 2014). Therefore, fast Pol II initiation and elongation

418

could result in both higher Pol II level and relatively low CTS efficiency (due to fast

419

elongation), a similar scenario that has been described for FLC locus (Wu et al., 2016a). In

420

addition, we also noted a significant peak that associated with Pol II CTD S5P but not Pol II

421

CTD S2P at the junction of exon 3’ end and intron 5’ end ( Supplementary Fig. 9), most

422

likely representing splicing intermediates (Nojima et al., 2015; Zhu et al., 2018). Consistent

423

with this explanation, such a peak is more pronounce in exon-intron-exon groups with low SS

424

ratios, a scenario that would produce such splicing intermediates more frequently during

425

transcription.

42: 427

We found in this study that RZ-1C is a relatively specific regulator of CTS instead of an

428

essential factor for splicing in general. Indeed, most defects of CTS that observed in the rz-1b

429

rz-1c were not associated with intron retention in the steady state mRNA level (Fig 5B, C;

430

Supplementary Fig. 5B to D; Supplementary Fig. 7). In these cases, despite slower rate of

431

splicing at chromatin in rz-1b rz-1c, effective splicing seems occurs at post-transcriptional

432

level, ensuring successful intron removal before mRNA maturation. In addition, frequent

433

intron retentions may activate surveillance pathways that selectively degrade such transcripts 18

434

(e.g. non-sense mediated decay). Future work will be needed to distinguish between these

435

possibilities. Nevertheless, RZ-1B and RZ-1C likely work at the interface between

43:

transcription and mRNA maturation, while the efficient CTS governed by the RZ-1B/1C may

437

be part of the pipeline that ensures successful mRNA maturation or fate determination.

438 439

An intriguing aspect to explore in the future is whether RZ-1B/1C also affect transcriptional

440

elongation rate, like in case of some mammalian SR proteins (Ji et al., 2013; Lin et al., 2008;

441

Xiao et al., 2019). Our preliminary analysis support this hypothesis. We observed a steeper 5’

442

to 3’ slope of CB-RNA in rz-1b rz-1c double mutant compared to Col-0 (Supplementary Fig.

443

10). As we explained in Fig. 1B, the shape of 5’ to 3’ slope is tuned by Pol II firing rate,

444

elongation rate and also termination rate. Notably, a change of initiation rate along won’t

445

affect the shape of 5’ to 3’ slope. Therefore, our data is consistent with a hypothesis that

44:

RZ-1B/1C affect Pol II dynamics globally. Notably, Pol II travels at uneven speed across a

447

gene (e.g. Pausing at the exon-intron boundary), therefore techniques like pNET-seq would

448

be required in the future to reveal the role of RZ-1B/1C in regulating Pol II activity at high

449

spatial resolution and may further links to its RNA-binding and the roles in CTS regulation.

450 451

In mammals, intron are often one magnitude longer than exons, therefore, it was considered

452

that splicing happens through initially recognizing an exon and its adjacent splice site, a

453

mechanism known as “exon definition” (De Conti et al., 2013). In contrast, plant gene

454

splicing is generally considered to occur by intron definition, given the abundant retention of

455

introns, and their shorter length, compared with mammals (Reddy, 2007). According to intron

45:

definition model, introns and their 5’splice sites and 3’ spliced sites can be determined by

457

spliceosome. Apart from the apparent difference of intron length, what contribute and

458

determine the intron or exon definition is a yet unsolved question. In mammals, splicing 19

459

factors such as SR proteins often bind with exon splicing enhancers (ESE) that located at

4:0

exons and therefore contribute to the exon-definition. Interestingly, as we reported in this

4:1

study, RZ-1C binds to mainly exons at the majority of its targets, although both its exonic and

4:2

intronic bindings are important for the regulation of CTS. Currently, it is unclear if intron

4:3

definition in plant would accompany with more frequent presence of regulatory motifs in

4:4

introns. Systematic analysis of RNA-binding features of splicing factors through eCLIP-seq,

4:5

a reliable and powerful method as we adapted in this study, would likely provide an unbiased

4::

answer to this question. Notably, a recent study in yeast showed E complex of spliceosome

4:7

can adapt both intron and exon definition mode without significant conformational change

4:8

and showed evidence that exon definition can also happen in yeast (Li et al., 2019), a species

4:9

confers intron-definition as previous considered. Therefore, it would be interesting to test if

470

exon-definition and intron definition co-exist in plant as well, in connection with the

471

RNA-binding properties of splicing factors.

472 473

An unexpected discovery from our study is that exon and or intron number correlated with

474

efficient CTS. Consistent with our finding, a recent study based on long reads sequencing of

475

chromatin-bound RNA reveals a significant proportion of yeast nascent RNA are either fully

47:

spliced or fully unspliced (Herzel et al., 2018). The removal of individual introns are likely to

477

be coordinated instead of following a simple ‘first come, first served’ rule (Fededa et al.,

478

2005; Tilgner et al., 2018). Although contribution of introns does exist, our work highlights

479

the role of exons in this mechanism. Exactly how multiple exons and introns promote CTS is

480

an exciting next question to address. RZ-1B/1C seems be part of this mechanism, as

481

RZ-1B/1C-regulated exon-introns-exon units were more enriched in genes with a higher

482

number of introns or exons (Fig.6I). Besides RZ-1B/1C, other factors likely are also involved

483

in this mechanism as the trend between CTS efficiency and intron numbers remains when 20

484

comparing RZ-1C bound and unbound genes (Supplementary Fig. 11A), despite the CTS

485

efficiency was generally higher for RZ-1C bound genes (Supplementary Fig. 11B). Previous

48:

research showed higher order interactions exist among splicing factors and RNAs (Singh et

487

al., 2012). This is consistent with the role of the RZ-1 proteins, which could promote CTS by

488

being situated at exons and introns, even when distal to the affected intron. RZ-1C could

489

form nuclear speckles and interact with multiple SR proteins (Wu et al., 2016b). Such

490

interactions were confirmed in vivo through GFP-RZ-1C-IP-MS, in addition, multiple other

491

splicing factors were also detected (Supplementary Table 4). An interesting hypothesis to test

492

is whether there are multivalent interactions among exons, introns and splicing regulators

493

such as RZ-1C and SRs, and if such interactions could contribute to the more efficient CTS

494

for gene with higher numbers of introns (Fig. 6I). What’s the role of Pol II during this process

495

is another yet unanswered question waiting for future study. Overall, our work has revealed

49:

the co-transcriptional nature of plant gene splicing and highlighted the role of exon/intron

497

numbers, as well as the role of RZ-1 proteins, in this process.

498 499

Methods

500

Plant materials and growth conditions

501

The Col-0 ecotype of Arabidopsis thaliana was used as the wild-type control in this study.

502

The rz-1b rz-1c double mutant and RZ-1C:GFP-RZ-1C in rz-1b rz-1c have been described

503

previously (Wu et al., 2016b). All experiments used seedlings grown on 1/2 Murashige and

504

Skoog (MS) medium for 10 d at 22 °C under 16h light/8h dark cycles.

505 50:

CB-RNA extraction and sequencing library construction

507

CB-RNA extraction in Arabidopsis seedlings was performed as described previously (Bhatt

508

et al., 2012; Wu et al., 2016a; Wuarin and Schibler, 1994) (Supplementary Fig. 1A) and 21

509

followed the guidelines previously proposed (Herzel and Neugebauer, 2015). In brief, nuclei

510

from 2 g of ground seedlings were prepared using a Honda buffer supplement with the

511

addition of RNase inhibitor and yeast tRNA, to protect the integrity of RNA. The nuclei

512

pellet was first resuspended in an equal volume of resuspension buffer (50 % glycerol, 0.5

513

mM EDTA, 1 mM DTT, 25 mM Tris-HCL pH 7.5, 100 mM NaCl, 100ng/µl tRNA, 0.01U/µl

514

RNasin®), followed by brief washing with two volumes of UREA wash buffer (25 mM

515

Tris-HCL pH 7.5, 300 mM NaCl, 1 M urea, 0.5 mM EDTA, 1 mM DTT, 1% Tween-20). The

51:

chromatin was pelleted by centrifugation at 8000 g for 1 min, resuspended with an equal

517

volume of resuspension buffer, followed by washing by using one volume of UREA wash

518

buffer. The resulting chromatin pellet after centrifugation was resuspended in 1 mL Trizol

519

and followed by RNA extraction in combination with the RNeasy Mini Kit (Qiagen). The

520

integrity of the resulting CB-RNA was checked by gel electrophoresis by comparing to the

521

pattern of total RNA. DNA copurified with CB-RNA was removed by two sequential

522

digestions with TURBO DNase (Thermo). The resulting RNA was then used for further

523

analysis.

524 525

For constructing the CB-RNA-seq library, the above CB-RNA was treated with Ribominus

52:

(Thermo) to remove the rRNA following the kit instructions. Any contaminated

527

polyadenylated RNAs were removed using Oligo d(T) cellulose beads (NEB). A

528

strand-specific RNA-seq library was then constructed using the dUTP method. The resulting

529

library was pair-end sequenced on the Illumina platform. A summary of all sequencing

530

results is included in Supplementary Table 1 .

531 532

CB-RNA and mRNA data analysis

22

533

The raw data of CB-RNA were mapped to the Arabidopsis genome (TAIR10) with HISAT2

534

v2.0.5 (Pertea et al., 2016), and the Fragments Per Kilobase of transcript per Million mapped

535

reads (FPKM) of the genes were calculated by StringTie v1.3.4d (Pertea et al., 2016) with

53:

default parameters. Genes with a FPKM < 1 were filtered following the 5ʹSS or 3ʹSS ratio

537

calculation for exon-intron-exon features to determine the co-transcriptional splicing

538

frequency (Fig. 1G). Reads were counted from CB-RNA-seq mapped to 25 bp upstream (5ʹ

539

exonic reads) or downstream (5ʹ intronic reads) of the exon-intron junction as well as 25 bp

540

downstream (3ʹ exonic reads) or upstream (3ʹ intronic reads) of the intron-exon junction. The

541

ratio of 5ʹ intronic reads number to 5ʹ exonic reads number was calculated as the 5ʹSS ratio

542

(Fig. 1G). The 3ʹSS ratio was calculated in a similar way. The differentially splicing introns

543

between the wild type (WT) and the double mutant were identified as previously reported by

544

using Fisher’s test (Deng et al., 2010). Briefly, Fisher’s test was performed to identify

545

differential representation of intron over two flanking exons. The test was performed by using

54:

normalized CB-RNA-seq or mRNA-seq read counts of each intron and its two flanking exons

547

of the WT and rz-1b rz-1c, followed by Benjamini-Hochberg multiple comparison correction.

548

The intron with an FDR less than 0.001 was considered as the differentially splicing intron.

549

The ChIP-seq data of histone modifications from GSE28398 (Luo et al., 2013) were mapped

550

by Bowtie v1.1.2 (Langmead et al., 2009) and only reads mapped to unique position of

551

genome were kept in following analysis. All histone modifications data were normalized by

552

total number of reads. The raw data from mRNA-seq were mapped with TopHat v2.1.1

553

(Trapnell et al., 2012). Cufflinks v2.2.1 (Trapnell et al., 2012) was used to calculate the

554

FPKM of genes. Cuffdiff v2.2.1 was used to identify significant differentially expressed

555

genes with the following criteria: FPKM > 1, fold change > 2 and FDR < 0.05.

55:

The alternative splicing isoforms A3SS, A5SS, IR and ES were identified based on TAIR10

557

gene annotation (Fig. 4). Individual AS event was kept for further analysis in case it is 23

558

supported by sequencing reads from our mRNA-seq data. "Percentage spliced in" (PSI) was

559

used to denote the fraction of alternative splicing isoforms, a popularly used method (Katz et

5:0

al., 2010). Briefly, PSI indicates the ratio of two isoforms expressed simultaneously in a

5:1

tissue. A PSI towards 1 or 0 suggests a trend that one isoform of an alternatively spliced gene

5:2

was expressed. A PSI of 0.5 indicates equal expression of two isoforms. PSI values were

5:3

calculated using MISO v0.5.4 (Katz et al., 2010) in the CB-RNA and mRNA datasets. Reads

5:4

coverage in this study was calculated as reads number per million total sequencing reads per

5:5

base pair.

5::

To analyse the variance of CTS caused by intron number and gene length, a general linear

5:7

model (GLM) was fitted as follows: CTS efficiency (calculated as the mean of 5ʹ and 3ʹSS

5:8

values in a gene) ~intron number of the gene × gene length of the gene, followed by an

5:9

ANOVA test. Only genes (10,382 of 12,910 genes with available SS values) between 1–4kb

570

in length were used in the GLM analysis.

571

eCLIP and sequencing library construction

572

eCLIP was performed as previously described with modifications (Konig et al., 2010; Meyer

573

et al., 2017; Van Nostrand et al., 2016; Zhang et al., 2015). In brief, plants growing on petri

574

dishes for 10 d were gently harvested and immersed in an shallow layer (0.5 cm) of ice-cold

575

1xPBS and cross-linked with UV (245 nM) light at 600 mJ/cm2 on ice, a dose proven to be

57:

sufficient for Arabidopsis seedlings (Meyer et al., 2017; Zhang et al., 2015). 2 g of liquid

577

nitrogen-ground plant powder was lysed with lysis buffer, DNase I/RNase I treated and

578

sonicated to release the RNA/proteins. The resulting lysate were cleared by centrifugation,

579

then subjected to immunoprecipitation with an anti-GFP antibody (Ab290, Abcam) and

580

protein A beads (Dynabeads, Thermo). The RNA/protein complex after IP was washed

581

extensively including several washes with high salt wash buffer (1M NaCl), treated with 24

582

polynucleotide kinase and alkaline phosphatase to remove the terminal phosphate, ligated to

583

an RNA adaptor at the 3ʹ end and end-labeled with

584

complex was eluted using LDS sample buffer (Thermo) and resolved on a 4–12 % NuPAGE

585

(Thermo) gel and then transferred to nitrocellulose membrane for autoradiography (~3hrs).

58:

The remaining 10 % of the complex was run in parallel to check the enrichment of the target

587

protein. The region above the size of proteins (from 70kD to ~140kD) was cut out based on

588

the autoradiograph and RNA was released by proteinase K digestion. The resulting RNA was

589

further purified and reverse transcribed using SuperScript III (Thermo) with a primer

590

complementary to the RNA adaptor. The resulting cDNA was purified and subjected to DNA

591

adaptor ligation to its 3ʹ end. The final library was amplified from the cDNA by using primers

592

specific to both end adaptors (FP-F and FP-R) using Q5 Hot-Start DNA polymerase.

593

GFP-RZ-1C-eCLIP sample, typically 12 cycles were sufficient for final library amplification

594

from cDNA. The resulting PCR products were purified and resolved on a 3 % low melting

595

point agarose gel and fragments from 175–400 bp were recovered from the gel and used for

59:

pair-end Illumina (PE150) sequencing. A summary of all sequencing results is shown in

597

Supplementary Table 1. All oligos and adaptor sequences are listed in Supplementary Table

598

5.

32

P. Ninety percent of the resulting

For

599 :00

eCLIP data analysis

:01

eCLIP data were processed using iCount v2.0.1.dev (T. Curk, 2016). Given the principle of

:02

eCLIP experiment, only R2 reads were used in binding site identification although pair-end

:03

sequencing results were generated. In brief, we used the “demultiplex” function, removed the

:04

adapter sequence from reads and extracted 10 bp random sequences, which were used to

:05

mark the PCR duplicates. The clean reads were then aligned with the “mapstar” function. All

:0:

binding sites were identified and quantified using "xlsites" function (parameters: --group_by 25

:07

start--quant cDNA). Finally, the "peaks" function was used to identify significant binding

:08

sites (FDR < 0.05), and binding clusters were generated with the "clusters" function. For the

:09

five replicates generated by eCLIP, the clusters were identified in each replicate separately,

:10

and in a pool of the five replicates. Only clusters from the pooled replicates that were also

:11

identified in all five replicates were retained as binding clusters. Only binding sites found in

:12

the binding clusters were kept for the subsequent analyses. MEME-ChIP (Machanick and

:13

Bailey, 2011) was used to find motifs enriched within the binding sites.

:14 :15

qPCR validation of CB-RNA-seq data and mRNA-seq data

:1:

CB-RNA was extracted and purified as described above. mRNA was purified from total RNA

:17

by using Dynabeads mRNA Purification Kit (Thermo). In brief, 500 ng of chromatin-bound

:18

RNA was treated twice with TURBO DNase (Thermo) and used for cDNA synthesis with

:19

gene-specific primers and SuperScript III (Thermo) at 52 ºC for 1 h. In case of mRNA, 50ng

:20

of mRNA was used for cDNA synthesis. Two primer pairs were designed for measuring

:21

splicing efficiency for each exon-intron-exon structure: one primer pair spanned the

:22

exon-exon junction for amplification of the spliced transcripts only, and another primer pair

:23

spanned the intron-exon junction for amplification of unspliced transcripts only. All the

:24

reverse primers were included in the reverse transcription reaction to ensure gene and strand

:25

specific detection of RNA. The resulting cDNA was diluted 30 times with water and used for

:2:

quantitative PCR using a qTOWER3 84 (Jena) and SYBR Green Master Mix (Roche). For

:27

data normalization, the value obtained by the spliced primer was normalized to the value

:28

obtained by the unspliced primer. The values from three independent samples were

:29

normalized to the mean WT level. Final values are ± S.E.M. from three independent samples.

:30

A ‘no reverse transcriptase’ control (- RT) was used to ensure the values reflected the level of

2:

:31

RNA and not DNA contamination. Primers used for cDNA synthesis and quantitative PCR

:32

are described in Supplementary Table 5.

:33 :34

RNA immunoprecipitation

:35

Nuclei from 2 g of ground seedlings were prepared with Honda buffer as described in

:3:

CB-RNA extraction and sequencing library construction. Nuclei pellets were resuspended in

:37

2.5 volumes of Nuclei Lysis Buffer (50 mM Tris-HCl pH 8.0, 10 mM EDTA, 1 % SDS, 1x

:38

protease inhibitor cocktail, 50 ng/µl tRNA) and sonicated in a Bioruptor (Diagenode) 15

:39

times (30s on/30s off). Immunoprecipitation was performed by incubating 50 µL Dynabeads

:40

protein A (Thermo), 5 µg antibody (Anti-GFP, AB290, Abcam), and 1.2 mL diluted

:41

chromatin (containing 100 µL sonicated chromatin) at 4 ºC for 1.5 h. After the IP, the beads

:42

were washed four times in washing buffer (167 mM Nacl, 16.7 mM Tris pH 7.5, 1.2 mM

:43

EDTA, 0.8 % Triton X-100, 1x protease inhibitor, 20 U/mL RNase inhibitor). The

:44

RNA-protein complex was eluted and reverse cross-linked by adding 200 µL elution buffer

:45

(2 mM EDTA, 0.2 % SDS, Proteinase K)) to the washed beads and incubating at 55 ºC

:4:

overnight. The RNA was precipitated, dissolved, DNase treated, and then used as a template

:47

for reverse transcription with gene/strand-specific primers. The values presented in the figure

:48

are ± S.E.M. from three independent samples and shown as IP/1 % of input (RNA). All the

:49

primers used are listed in Supplementary Table 5.

:50 :51

Immunoprecipitation and Mass Spectrometry

:52

Immunoprecipitation and Mass Spectrometry analysis were done as previously described

:53

with modifications (Questa et al., 2016; Wang et al., 2014). For immunoprecipitation, 5 g of

:54

seedling of RZ-1C:GFP-RZ-1C and 35S:GFP were harvested, grinded to fine powder with

:55

liquid nitrogen. Two volumes of extraction buffer (20 mM Tris–HCl pH 8, 150 mM NaCl, 27

:5:

2.5 mM EDTA, 1% Triton X-100 and 2x Proteinase inhibitor (Roche)) were added into the

:57

grinded powder, followed by gentle rotation at 4°C for 10mins. The supernatant was

:58

recovered after three sequential centrifugation at 14000rpm for 5mins. Pre-washed GFP-Trap

:59

agarose beads (Chromotek) were incubated with supernatant for 2hrs at 4°C with rotation,

::0

followed by washing with 1ml extraction buffer for 5 times. The resulting beads were boiled

::1

in 1x LDS sample buffer(Thermo). 5% of the resulting samples were loaded on 10%

::2

SDS-PAGE gel and transferred to PVDF membrane, followed by western blot by using

::3

anti-GFP antibody (AB290, Abcam).

::4

For Mass Spectrometry analysis, protein samples from above steps were loaded into

::5

SDS-PAGE gel (10% stacking gels) and run until all the loading dye migrate into the gel. The

:::

band was cut out as a whole, washed, reduced and alkylated, followed by trypsin digestion as

::7

previously reported (Shevchenko et al., 2006). Peptides extraction was done by using 5%

::8

formic acid/50% acetonitrile, dried down and re-dissolved in 0.1% TFA. LC-MS/MS analysis

::9

was done by using Synapt G2Si mass spectrometer (Waters) combined with a

:70

nanoAcquityTM UPLCTM-system (Waters) running a BEH C18 column (1.7 µm, 75 µm x

:71

250mm, Waters). Raw files from the Synapt G2Si were processed with the Protein Lynx

:72

Global Server (PLGS 3.0.2) software (Waters), and search against TAIR10_pep_20101214

:73

database (www.arabidopsis.org). Results were exported as excel files using the Ion

:74

Accounting Output (IdentityE).

:75 :7:

Data Availability

:77

All data generated in this study were deposited in the public domain as follows: Gene

:78

Expression Omnibus data accession: GSE128619

:79

(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128619).

:80 28

:81

Acknowledgements

:82

We thank Dr. Caroline Dean for her generous support on this work. We thank Dr. Hongwei

:83

Guo, Dr. Yu Zhou, Dr. Suomeng Dong, Dr Robert Ietswaart and Dr. Zhicheng Dong for

:84

discussions and suggestions; Dr. Shaofang Li and Dr. Xuemei Chen for communication and

:85

coordination; Dr. Xiaofeng Cao and Dr. Jernej Ule for suggestions on CLIP experiment. This

:8:

work was supported by Guangdong Innovation Research Team Fund (2016ZT06S172), the

:87

Shenzhen Sci-Tech Fund (KYTDPT20181011104005) and the National Natural Science

:88

Foundation of China (31771365 to ZW and 31800268 to DLZ).

:89 :90

Author Contributions

:91

ZW, YFW, HYG and LJQ designed the research; DLZ, FM, XYL and ZW performed the

:92

research; FM, DLZ, LFG, YFW, ZW analysed the data; ZW and YFW wrote the paper.

:93 :94

References

:95 :9: :97 :98 :99 700 701 702 703 704 705 70: 707 708 709 710 711 712 713 714 715 71: 717 718

Alexander, R.D., Barrass, J.D., Dichtl, B., Kos, M., Obtulowicz, T., Robert, M.C., Koper, M., Karkusiewicz, I., Mariconti, L., Tollervey, D., et al. (2010). RiboSys, a high-resolution, quantitative approach to measure the in vivo kinetics of pre-mRNA splicing and 3'-end processing in Saccharomyces cerevisiae. RNA 16:2570-2580. Alexandrov, A., Colognori, D., Shu, M.D., and Steitz, J.A. (2012). Human spliceosomal protein CWC22 plays a role in coupling splicing to exon junction complex deposition and nonsense-mediated decay. Proc Natl Acad Sci U S A 109:21313-21318. Ameur, A., Zaghlool, A., Halvardson, J., Wetterbom, A., Gyllensten, U., Cavelier, L., and Feuk, L. (2011). Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nat Struct Mol Biol 18:1435-1440. Aslanzadeh, V., Huang, Y., Sanguinetti, G., and Beggs, J.D. (2018). Transcription rate strongly affects splicing fidelity and cotranscriptionality in budding yeast. Genome Res 28:203-213. Barbosa, I., Haque, N., Fiorini, F., Barrandon, C., Tomasetto, C., Blanchette, M., and Le Hir, H. (2012). Human CWC22 escorts the helicase eIF4AIII to spliceosomes and promotes exon junction complex assembly. Nat Struct Mol Biol 19:983-990. Bauren, G., and Wieslander, L. (1994). Splicing of Balbiani ring 1 gene pre-mRNA occurs simultaneously with transcription. Cell 76:183-192. Bentley, D.L. (2014). Coupling mRNA processing with transcription in time and space. Nat Rev Genet 15:163-175. Beyer, A.L., and Osheim, Y.N. (1988). Splice site selection, rate of splicing, and alternative splicing on nascent transcripts. Genes Dev 2:754-765. Bhatt, D.M., Pandya-Jones, A., Tong, A.J., Barozzi, I., Lissner, M.M., Natoli, G., Black, D.L., and Smale, S.T. (2012). Transcript dynamics of proinflammatory genes revealed by sequence analysis of subcellular RNA fractions. Cell 150:279-290. 29

719 720 721 722 723 724 725 72: 727 728 729 730 731 732 733 734 735 73: 737 738 739 740 741 742 743 744 745 74: 747 748 749 750 751 752 753 754 755 75: 757 758 759 7:0 7:1 7:2 7:3 7:4 7:5 7:: 7:7 7:8 7:9 770 771 772 773

Brugiolo, M., Herzel, L., and Neugebauer, K.M. (2013). Counting on co-transcriptional splicing. F1000Prime Rep 5:9. Clery, A., Jayne, S., Benderska, N., Dominguez, C., Stamm, S., and Allain, F.H. (2011). Molecular basis of purine-rich RNA recognition by the human SR-like protein Tra2-beta1. Nat Struct Mol Biol 18:443-450. Church, M.C., and Fleming, A.B. (2018). A role for histone acetylation in regulating transcription elongation. Transcription 9:225-232. Das, R., Yu, J., Zhang, Z., Gygi, M.P., Krainer, A.R., Gygi, S.P., and Reed, R. (2007). SR proteins function in coupling RNAP II transcription to pre-mRNA splicing. Mol Cell 26:867-881. de Almeida, S.F., and Carmo-Fonseca, M. (2014). Reciprocal regulatory links between cotranscriptional splicing and chromatin. Semin Cell Dev Biol 32:2-10. De Conti, L., Baralle, M., and Buratti, E. (2013). Exon and intron definition in pre-mRNA splicing. Wiley Interdiscip Rev RNA 4:49-60. de la Mata, M., Alonso, C.R., Kadener, S., Fededa, J.P., Blaustein, M., Pelisch, F., Cramer, P., Bentley, D., and Kornblihtt, A.R. (2003). A slow RNA polymerase II affects alternative splicing in vivo. Mol Cell 12:525-532. de la Mata, M., and Kornblihtt, A.R. (2006). RNA polymerase II C-terminal domain mediates regulation of alternative splicing by SRp20. Nat Struct Mol Biol 13:973-980. de la Mata, M., Lafaille, C., and Kornblihtt, A.R. (2010). First come, first served revisited: factors affecting the same alternative splicing event have different effects on the relative rates of intron removal. RNA 16:904-912. Deng, X., Gu, L., Liu, C., Lu, T., Lu, F., Lu, Z., Cui, P., Pei, Y., Wang, B., Hu, S., et al. (2010). Arginine methylation mediated by the Arabidopsis homolog of PRMT5 is essential for proper pre-mRNA splicing. Proc Natl Acad Sci U S A 107:19114-19119. Dolata, J., Guo, Y., Kolowerzo, A., Smolinski, D., Brzyzek, G., Jarmolowski, A., and Swiezewski, S. (2015). NTR1 is required for transcription elongation checkpoints at alternative exons in Arabidopsis. EMBO J 34:544-558. Drexler HL, Choquet K, Churchman LS (2019) Human co-transcriptional splicing kinetics and coordination revealed by direct nascent RNA sequencing. bioRxiv, :611020. https://doi.org/10.1101/611020. Fededa, J.P., Petrillo, E., Gelfand, M.S., Neverov, A.D., Kadener, S., Nogues, G., Pelisch, F., Baralle, F.E., Muro, A.F., and Kornblihtt, A.R. (2005). A polar mechanism coordinates different regions of alternative splicing within a single gene. Mol Cell 19:393-404. Godoy Herz, M.A., Kubaczka, M.G., Brzyzek, G., Servi, L., Krzyszton, M., Simpson, C., Brown, J., Swiezewski, S., Petrillo, E., and Kornblihtt, A.R. (2019). Light Regulates Plant Alternative Splicing through the Control of Transcriptional Elongation. Mol Cell. Hanano, S., Sugita, M., and Sugiura, M. (1996). Isolation of a novel RNA-binding protein and its association with a large ribonucleoprotein particle present in the nucleoplasm of tobacco cells. Plant Mol Biol 31:57-68. Herzel, L., and Neugebauer, K.M. (2015). Quantification of co-transcriptional splicing from RNA-Seq data. Methods 85:36-43. Herzel, L., Straube, K., and Neugebauer, K.M. (2018). Long-read sequencing of nascent RNA reveals coupling among RNA processing events. Genome Res 28:1008-1019. Huranova, M., Ivani, I., Benda, A., Poser, I., Brody, Y., Hof, M., Shav-Tal, Y., Neugebauer, K.M., and Stanek, D. (2010). The differential interaction of snRNPs with pre-mRNA reveals splicing kinetics in living cells. J Cell Biol 191:75-86. Ietswaart, R., Rosa, S., Wu, Z., Dean, C., and Horward, M. (2017). Cell-Size-Dependent transcription of FLC and its antisense long non-coding RNA COOLAIR explain cell-to-cell expression variation. Cell Syst. 4:622-635. Jangi, M., and Sharp, P.A. (2014). Building robust transcriptomes with master splicing factors. Cell 159:487-498. Jeong, S. (2017). SR Proteins: Binders, Regulators, and Connectors of RNA. Mol Cells 40:1-9. Ji, X., Zhou, Y., Pandit, S., Huang, J., Li, H., Lin, C.Y., Xiao, R., Burge, C.B., and Fu, X.D. (2013). SR proteins collaborate with 7SK and promoter-associated nascent RNA to release paused polymerase. Cell 153:855-868. 30

774 775 77: 777 778 779 780 781 782 783 784 785 78: 787 788 789 790 791 792 793 794 795 79: 797 798 799 800 801 802 803 804 805 80: 807 808 809 810 811 812 813 814 815 81: 817 818 819 820 821 822 823 824 825 82: 827

Katz, Y., Wang, E.T., Airoldi, E.M., and Burge, C.B. (2010). Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7:1009-1015. Khodor, Y.L., Rodriguez, J., Abruzzi, K.C., Tang, C.H., Marr, M.T., 2nd, and Rosbash, M. (2011). Nascent-seq indicates widespread cotranscriptional pre-mRNA splicing in Drosophila. Genes Dev 25:2502-2512. Khodor, Y.L., Menet, J.S., Tolan, M., and Rosbash, M. (2012). Cotranscriptional splicing efficiency differs dramatically between Drosophila and mouse. RNA 18:2174-2186. Konig, J., Zarnack, K., Rot, G., Curk, T., Kayikci, M., Zupan, B., Turner, D.J., Luscombe, N.M., and Ule, J. (2010). iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol 17:909-915. Kornblihtt, A.R. (2007). Coupling transcription and alternative splicing. Adv Exp Med Biol 623:175-189. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. Li, X., Liu, S., Zhang, L., Issaian, A., Hill, R.C., Espinosa, S., Shi, S., Cui, Y., Kappel, K., Das, R., et al. (2019). A unified mechanism for intron and exon definition and back-splicing. Nature 573:375-380. Lin, S., Coutinho-Mansfield, G., Wang, D., Pandit, S., and Fu, X.D. (2008). The splicing factor SC35 has an active role in transcriptional elongation. Nat Struct Mol Biol 15:819-826. Lorkovic, Z.J., and Barta, A. (2002). Genome analysis: RNA recognition motif (RRM) and K homology (KH) domain RNA-binding proteins from the flowering plant Arabidopsis thaliana. Nucleic Acids Res 30:623-635. Luo, C., Sidote, D.J., Zhang, Y., Kerstetter, R.A., Michael, T.P., and Lam, E. (2013). Integrative analysis of chromatin states in Arabidopsis identified potential regulatory mechanisms for natural antisense transcript production. Plant J 73:77-90. Machanick, P., and Bailey, T.L. (2011). MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27:1696-1697. Marondedze, C., Thomas, L., Serrano, N.L., Lilley, K.S., and Gehring, C. (2016). The RNA-binding protein repertoire of Arabidopsis thaliana. Sci Rep 6:29766. Marquardt, S., Raitskin, O., Wu, Z., Liu, F., Sun, Q., and Dean, C. (2014). Functional consequences of splicing of the antisense transcript COOLAIR on FLC transcription. Mol Cell 54:156-165. Meyer, K., Koster, T., Nolte, C., Weinholdt, C., Lewinski, M., Grosse, I., and Staiger, D. (2017). Adaptation of iCLIP to plants determines the binding landscape of the clock-regulated RNA-binding protein AtGRP7. Genome Biol 18:204. Monsalve, M., Wu, Z., Adelmant, G., Puigserver, P., Fan, M., and Spiegelman, B.M. (2000). Direct coupling of transcription and mRNA processing through the thermogenic coactivator PGC-1. Mol Cell 6:307-316. Naftelberg, S., Schor, I.E., Ast, G., and Kornblihtt, A.R. (2015). Regulation of alternative splicing through coupling with transcription and chromatin structure. Annu Rev Biochem 84:165-198. Nagai, S., Davis, R.E., Mattei, P.J., Eagen, K.P., and Kornberg, R.D. (2017). Chromatin potentiates transcription. Proc Natl Acad Sci U S A 114:1536-1541. Nojima, T., Gomes, T., Grosso, A.R.F., Kimura, H., Dye, M.J., Dhir, S., Carmo-Fonseca, M., and Proudfoot, N.J. (2015). Mammalian NET-Seq Reveals Genome-wide Nascent Transcription Coupled to RNA Processing. Cell 161:526-540. Nojima, T., Rebelo, K., Gomes, T., Grosso, A.R., Proudfoot, N.J., and Carmo-Fonseca, M. (2018). RNA Polymerase II Phosphorylated on CTD Serine 5 Interacts with the Spliceosome during Co-transcriptional Splicing. Mol Cell 72:369-379 e364. Oesterreich, F.C., Herzel, L., Straube, K., Hujer, K., Howard, J., and Neugebauer, K.M. (2016). Splicing of nascent RNA coincides with intron exit from RNA polymerase II. Cell 165:372-381. Osheim, Y.N., Miller, O.L., Jr., and Beyer, A.L. (1985). RNP particles at splice junction sequences on Drosophila chorion transcripts. Cell 43:143-151. Pandya-Jones, A., and Black, D.L. (2009). Co-transcriptional splicing of constitutive and alternative exons. RNA 15:1896-1908. 31

828 829 830 831 832 833 834 835 83: 837 838 839 840 841 842 843 844 845 84: 847 848 849 850 851 852 853 854 855 85: 857 858 859 8:0 8:1 8:2 8:3 8:4 8:5 8:: 8:7 8:8 8:9 870 871 872 873 874 875 87: 877 878 879 880 881

Pertea, M., Kim, D., Pertea, G.M., Leek, J.T., and Salzberg, S.L. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11:1650-1667. Questa, J.I., Song, J., Geraldo, N., An, H., and Dean, C. (2016). Arabidopsis transcriptional repressor VAL1 triggers Polycomb silencing at FLC during vernalization. Science 353:485-488. Reddy, A.S. (2007). Alternative splicing of pre-messenger RNAs in plants in the genomic era. Annu Rev Plant Biol 58:267-294. Reddy, A.S., Marquez, Y., Kalyna, M., and Barta, A. (2013). Complexity of the alternative splicing landscape in plants. Plant Cell 25:3657-3683. Reddy, A.S., and Shad Ali, G. (2011). Plant serine/arginine-rich proteins: roles in precursor messenger RNA splicing, plant development, and stress responses. Wiley Interdiscip Rev RNA 2:875-889. Reichel, M., Liao, Y., Rettel, M., Ragan, C., Evers, M., Alleaume, A.M., Horos, R., Hentze, M.W., Preiss, T., and Millar, A.A. (2016). In Planta Determination of the mRNA-Binding Proteome of Arabidopsis Etiolated Seedlings. Plant Cell 28:2435-2452. Roberts, G.C., Gooding, C., Mak, H.Y., Proudfoot, N.J., and Smith, C.W. (1998). Co-transcriptional commitment to alternative splice site selection. Nucleic Acids Res 26:5568-5572. Rosa, S., Duncan, S., and Dean, C. (2016). Mutually exclusive sense-antisense transcription at FLC facilitates environmentally induced gene repression. Nat Commun 7:13031. Saldi, T., Cortazar, M.A., Sheridan, R.M., and Bentley, D.L. (2016). Coupling of RNA polymerase II transcription elongation with pre-mRNA splicing. J Mol Biol 428:2623-2635. Shevchenko, A., Tomas, H., Havlis, J., Olsen, J.V., and Mann, M. (2006). In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat Protoc 1:2856-2860. Singh, G., Kucukural, A., Cenik, C., Leszyk, J.D., Shaffer, S.A., Weng, Z., and Moore, M.J. (2012). The cellular EJC interactome reveals higher-order mRNP structure and an EJC-SR protein nexus. Cell 151:915-916. Singh, J., and Padgett, R.A. (2009). Rates of in situ transcription and splicing in large human genes. Nat Struct Mol Biol 16:1128-1133. Staiger, D., and Brown, J.W. (2013). Alternative splicing at the intersection of biological timing, development, and stress responses. Plant Cell 25:3640-3656. Steckelberg, A.L., Boehm, V., Gromadzka, A.M., and Gehring, N.H. (2012). CWC22 connects pre-mRNA splicing and exon junction complex assembly. Cell Rep 2:454-461. T. Curk, G.R., Č. Gorup, J. Zmrzlikar, J. Konig, Y. Sugimoto, N. Haberman, G. Bobojević, C. Hauer, M. Hentze, B. Zupan, J. Ule,. (2016). iCount: protein-RNA interaction iCLIP data analysis (in preparation). Tilgner, H., Knowles, D.G., Johnson, R., Davis, C.A., Chakrabortty, S., Djebali, S., Curado, J., Snyder, M., Gingeras, T.R., and Guigo, R. (2012). Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res 22:1616-1625. Tilgner, H., Jahanbani, F., Gupta, I., Collier, P., Wei, E., Rasmussen, M., and Snyder, M. (2018). Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome. Genome Res 28:231-242. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7:562-578. Ullah, F., Hamilton, M., Reddy, A.S.N., and Ben-Hur, A. (2018). Exploring the relationship between intron retention and chromatin accessibility in plants. BMC Genomics 19:21. Van Nostrand, E.L., Pratt, G.A., Shishkin, A.A., Gelboin-Burkhart, C., Fang, M.Y., Sundararaman, B., Blue, S.M., Nguyen, T.B., Surka, C., Elkins, K., et al. (2016). Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods 13:508-514. Vargas, D.Y., Shah, K., Batish, M., Levandoski, M., Sinha, S., Marras, S.A., Schedl, P., and Tyagi, S. (2011). Single-molecule imaging of transcriptionally coupled and uncoupled splicing. Cell 147:1054-1065. 32

882 883 884 885 88: 887 888 889 890 891 892 893 894 895 89: 897 898 899 900 901 902 903 904 905 90: 907 908

Wang, Z.W., Wu, Z., Raitskin, O., Sun, Q., and Dean, C. (2014). Antisense-mediated FLC transcriptional repression requires the P-TEFb transcription elongation factor. Proc Natl Acad Sci U S A 111:7468-7473. Wozniak, G.G., and Strahl, B.D. (2014). Hitting the 'mark': interpreting lysine methylation in the context of active transcription. Biochim Biophys Acta 1839:1353-1361. Wu, Z., Ietswaart, R., Liu, F., Yang, H., Howard, M., and Dean, C. (2016a). Quantitative regulation of FLC via coordinated transcriptional initiation and elongation. Proc Natl Acad Sci U S A 113:218-223. Wu, Z., Zhu, D., Lin, X., Miao, J., Gu, L., Deng, X., Yang, Q., Sun, K., Zhu, D., Cao, X., et al. (2016b). RNA Binding Proteins RZ-1B and RZ-1C Play Critical Roles in Regulating Pre-mRNA Splicing and Gene Expression during Development in Arabidopsis. Plant Cell 28:55-73. Wuarin, J., and Schibler, U. (1994). Physical isolation of nascent RNA chains transcribed by RNA polymerase II: evidence for cotranscriptional splicing. Mol Cell Biol 14:7219-7225. Xiao, R., Chen, J.Y., Liang, Z., Luo, D., Chen, G., Lu, Z.J., Chen, Y., Zhou, B., Li, H., Du, X., et al. (2019). Pervasive Chromatin-RNA Binding Protein Interactions Enable RNA-Based Regulation of Transcription. Cell 178:107-121 e118. Xing, D., Wang, Y., Hamilton, M., Ben-Hur, A., and Reddy, A.S. (2015). Transcriptome-wide identification of RNA targets of Arabidopsis SERINE/ARGININE-RICH45 uncovers the unexpected roles of this RNA-binding protein in RNA processing. Plant Cell 27:3294-3308. Zhang, G., Taneja, K.L., Singer, R.H., and Green, M.R. (1994). Localization of pre-mRNA splicing in mammalian nuclei. Nature 372:809-812. Zhang, Y., Gu, L., Hou, Y., Wang, L., Deng, X., Hang, R., Chen, D., Zhang, X., Zhang, Y., Liu, C., et al. (2015). Integrative genome-wide analysis reveals HLP1, a novel RNA-binding protein, regulates plant flowering by targeting alternative polyadenylation. Cell Res 25:864-876. Zhu, J., Liu, M., Liu, X., and Dong, Z. (2018). RNA polymerase II activity revealed by GRO-seq and pNET-seq in Arabidopsis. Nat Plants 4:1112-1123.

909 910

Figure Legends

911

Fig.1 Widespread CTS in Arabidopsis seedlings. A. A cartoon illustrating the principle of

912

CB-RNA extraction. TSS: Transcription starting site; PAS: Polyadenylation site. B. A

913

diagram describing the two fractions of CB-RNAs. The pattern of elongating RNA ( )

914

along the gene is determined by transcription initiation rate () and elongation rate (); the

915

level s  () = 

91:

not been released ( ) gives a flat pattern along the gene and can be expressed as

917

 =  ⁄ , where  is the release rate of RNA. C. A diagram illustrating

918

co-transcriptional splicing (upper panel) and the expected pattern of CB-RNA-seq across a

919

gene with multiple introns (lower panel). For b and c, yellow cycle and blue lines indicate Pol

920

II and nascent RNA respectively.



 () = (

  

). RNA that has finished transcription but

D and E. Visualization of CB-RNA-seq and mRNA-seq 33

921

results for three selected examples. A 5ʹ to 3ʹ declining slope is seen in the CB-RNA-seq

922

(Blue) but not in the mRNA-seq profile (Red). The gene structure is indicated at the bottom

923

as grey bars. F. Meta-gene analysis indicating widespread CTS, the figure shows the

924

CB-RNA-seq and mRNA-seq reads coverage along exon-intron-exon features. G. A diagram

925

showing the definition of 5ʹSS and 3ʹSS ratios. H. Mean and distribution of 5ʹSS and 3ʹSS

92:

ratios in Arabidopsis seedlings (Col-0). For all the box plot in this study, the box plot shows

927

the median, the 25th and the 75th percentiles, and sample size was indicated in case it is less

928

than 1000.

929 930

Fig. 2 CTS efficiency is correlated with gene expression level and intron or exon

931

numbers, independent of gene length.

932

A. The relationship between CTS efficiency and gene expression level. Genes were divided

933

into five groups according to their expression level (measured as Fragments Per Kilobase of

934

transcript per Million mapped reads, FPKM) from highest to lowest at 20 % intervals along

935

the X-axis, the corresponding 5ʹSS and 3ʹSS values are shown on the Y-axis. B. The

93:

relationship between CTS efficiency and gene length. Genes were divided into five groups

937

according to their length from highest to lowest at 20 % intervals along the X-axis, the

938

corresponding 5ʹSS and 3ʹSS value are shown on the Y-axis. C. The relationship between

939

CTS efficiency and intron numbers. Gene groups with different intron numbers are shown

940

along the X-axis. D. Scatter plots indicating the correlation between Gene length and intron

941

numbers. The intensity of blue colour indicates the density of datapoints. Black dots indicate

942

outliers. E. The relationship between CTS efficiency and intron number with a fixed gene

943

length. The panel is presented as in C except that genes with similar lengths were analysed. F.

944

The relationship between CTS efficiency and gene length with fixed intron numbers. The

945

panel is presented as in B except that genes with the same number of introns were analysed. 34

94:

Total sample size of 1, 2, 3 intron number is 1453, 1540 and 1445, respectively. G. The

947

relationship between exon length with the CTS efficiency of the adjacent intron. Exons were

948

divided into five different groups at 20 % intervals according to their length and displayed

949

along the X-axis. H. The relationship between intron length and CTS efficiency itself. Introns

950

were divided into five different groups at 20 % intervals according to their length and

951

displayed along the X-axis. The layout is the same as in G, except that introns not exons were

952

analysed. Sample size was indicated in case it is less than 1000.

953 954

Fig. 3 Relationships between different histone modifications and CTS efficiency.

955

Levels (Y-axis) of H3K4me3 (left panels), H3K9ac (middle panels) and H3K27me3 (right

95:

panels) that correspond to different gene features (X-axis) are indicated. Different line colours

957

indicate the groups of introns classified based on their average 5ʹSS (top panels) or 3ʹSS

958

(bottom panels) ratios from highest to lowest with 20 % intervals between two adjacent groups.

959

Note that CTS efficiency negatively correlates with H3K4me3 and H3K9ac levels, while it has

9:0

no correlation with the level of H3K27me3.

9:1 9:2

Fig. 4 Alternative splicing events are often determined co-transcriptionally.

9:3

A. A diagram illustrating the different alternative splicing events analysed. B. To quantify a

9:4

certain alternative splicing event, the Percent Spliced in Index (PSI, see Methods) was

9:5

calculated for mRNA-seq and CB-RNA-seq data and is shown on scatter plots. The result

9::

showed the PSI values from both data sets are highly correlated. The intensity of blue colour

9:7

indicates the density of datapoints. Black dots indicate outliers. C. Examples of alternative

9:8

splicing events that either determined co-transcriptionally (Left) or post-transcriptionally

9:9

(Right). D. A comparison of SS ratios between introns involved in alternative splicing and 35

970

other introns from the same group of genes. Alternative splicing (AS) events detected in

971

mRNA-seq data were classified into different groups according to the different AS type in A,

972

and their corresponding SS ratios were calculated and displayed in first box plot on left hand

973

side. The average (ave), minimum (min) and maximum (max) SS ratio from the same group

974

of genes were also calculated and shown along the X-axis. The number (N) of alternative

975

splicing events analysed in the assay is shown at the top of each chart. Horizontal line was

97:

added to better visualize difference of median level between box plots. p values on top of

977

each box plot were calculated over the first box plot on the left hand side based on Wilcoxon

978

test.

979 980

Fig. 5 RZ-1B and RZ-1C globally promote efficient CTS.

981

A. Distribution of introns according to their 3’SS ratios in Col-0 (WT) and in the rz-1b rz-1c

982

double mutant. B. Box-plot showing 5ʹSS and 3ʹSS values in CB-RNA of 6,119

983

exon-intron-exon units that were differentially spliced in rz-1b rz-1c compared with Col-0. C.

984

Box-plot showing 5’SS and 3’SS values in mRNA of the same set of exon-intron-exons units

985

as in B. For B and C, p values were calculated between red and green box plot based on

98:

two-tailed t-test. D. A diagram illustrating the principle of eCLIP-seq. E. Autoradiograph

987

indicating successful harvesting of RZ-1C-bound RNA after IP and denaturing gel

988

electrophoresis in the RZ-1C eCLIP experiment. The signal comes from P32-end labelling of

989

RNA. High concentration RNase treatment released the long RNAs associated with RZ-1C

990

and therefore only a band corresponding to the molecular weight of GFP-RZ-1C is observed.

991

F. Distribution of eCLIP XLS (crosslink sites) at exons, introns, 5ʹUTR and 3ʹUTR. Numbers

992

in brackets indicate the number of sites in each category. G. Overlap between genes with

993

RZ-1C-eCLIP XLS and genes with altered CTS efficiency (same set of genes as shown in B

994

and C). H and I. Two examples of genes that have both RZ-1C bound and reduced splicing 3:

995

efficiency. Sequencing results from CB-RNA, mRNA, GFP-RZ-1C-eCLIP and GFP-eCLIP

99:

are shown as different panels in H and affected introns are shown as grey boxes. qPCR

997

validation of altered splicing efficiency as well as RZ-1C RNA binding (By RIP-qPCR, see

998

method) is shown in I, primer location is indicated at the top of the bar chart. Data are

999

presented as mean ± s.e.m. (n = 3). Asterisk indicates a significant difference based on

1000

two-tailed t-test (p <0.05).

1001 1002

Fig. 6 RZ-1B and RZ-1C promote CTS through both local and global mode.

1003

A. Distribution of GFP-RZ-1C-eCLIP XLS along the gene. Different line colours indicate

1004

groups of different expression levels as judged by mRNA-seq data. B. Distribution of

1005

GFP-RZ-1C-eCLIP XLS along the exon-intron-exon structure. The number of XLS were

100:

normalized to the read density in CB-RNA-seq and shown as Log value on Y-axis. Different

1007

line colours indicate groups of different binding patterns. C. Distribution of RZ-1C bound

1008

genes based on the position of XLS at those genes. D. Box plot showing the abundance of

1009

RZ-1C bound and unbound introns according to the CB-RNA-seq. E. Box plot showing the

1010

enrichment of exon-intron junction reads and exon-exon junction reads in GFP-RZ-1C-eCLIP.

1011

Only reads with XLS at exon were calculated. At each exon-exon or exon-junction, eCLIP

1012

reads density were normalized to the reads density in CB-RNA-seq and shown as Log value

1013

on Y-axis. For D and E, p values were calculated between red and green box plot based on

1014

Wilcoxon test. F. MEME-ChIP analysis identified the top eight motifs that were enriched

1015

within +/- 10bp windows of eCLIP XLS. G. Top, a gene model illustrating the RZ-1C

101:

binding target with the splicing-affected (in the rz-1c rz-1b mutant) exon-intron-exon units

1017

shown as black boxes and the random exon-intron-exon units as grey boxes. Bottom, the

1018

black bars summarize the GFP-RZ-1C-eCLIP XLS distribution in the splicing-affected units;

1019

while the grey bars summarize the GFP-RZ-1C-eCLIP XLS distribution in units that were 37

1020

randomly taken from other regions of the same genes. The results are summarized by using

1021

different types of eCLIP reads displayed along the X-axis. p values were calculated by exact

1022

Fisher test. H. A gene model emphasising the splicing-affected (in the rz-1c rz-1b mutant)

1023

exon-intron-exon units with or without RZ-1C binding. The black bar indicates the ratio of

1024

the splicing-affected unit with RZ-1C binding, while the grey bar indicates the

1025

splicing-affected unit without RZ-1C binding. The results are summarized by using different

102:

types of eCLIP reads displayed along the X-axis. I. Distribution of genes that less effectively

1027

spliced in rz-1b rz-1c (Red) according to their intron numbers. Distribution of the rest of

1028

genes in genome was labelled in grey. J. A cartoon illustrating the proposed working model

1029

of RZ-1B and RZ-1C in promoting CTS. In the local mode, RZ-1C promotes splicing through

1030

binding to the affected intron or exon adjacent to it. In the global mode, RZ-1C promotes

1031

splicing through binding to exons which could be either proximal or distal to the affected

1032

intron. Pol II, RNA polymerase II.

1033

38

A

B

C

CB-RNAs = RNA e + RNA f

Free proteins &RNAs (Removed) 5’ 3’ TSS position X PAS Elongation RNA (RNA e)

Nuclei

5’ TSS

PAS

Chromatin-Bound RNAs (CB-RNAs)

D 5’

3’

5’ 3’ TSS PAS Full length RNA (RNA f)

5’

E

26 kb

CB-RNA

3’

5’

3’

Exon

Exon

Exon

5’

3’

At1g03060.1

At1g67140.3

G

H

5’ exonic reads (25bp) 5’ SS Exon

0.005

5’ intronic reads

0

5’ SS ratio =

exon

Figure 1

intron

exon

intron

exon

3’ SS ratio =

3’ exonic reads 3’ SS

Exon

3’ intronic reads 5’ intronic reads 5’ exonic reads 3’ intronic reads 3’ exonic reads

0.0 0.2 0.4 0.6 0.8 1.0

CB−seq mRNA−seq

0.01

reads coverage

0.015

F

SS ratio

At1g03070.2

Exon

18 kb

mRNA

At1g03080.1

3’

Exon

CB-RNAs

Cell

Exon

5’

Col-0

3’ SS

5’ SS

3’

3.0

r = 0.81

2.5

Gene Length (kb)

1.0 0.8

2.0 1.5 1.0

291

522

0.5 0.0

110

119 900

1.0 0.8 0.6

87

0.2 0

Longest >5

Intron number: Three 3’ SS 5’ SS

Shortest Gene Length

Gene Exon Length

H

SS ratio

3’ SS 5’ SS

0.2

0.4

0

0

Longest

Shortest

1.0

2 3 4 5 Intron Number

0.8

1

0.2

0.4 0.2 0

Figure 2

73

0.6

0.8 0.6

0.6 0.4 0.2 0

Shortest

3’ SS 5’ SS

0.6

Intron number: Two

80

0.4

>5

60

65

0.4

2 3 4 5 Intron Number

0.8

Intron number: One

1.0

>5

G

0

0

0

2 3 4 5 Intron Number

1

40

Intron Number

0.2

1

20

0.4

379

0

>5

SS ratio

1.0 0.8

346

0.4

0.4

3

Gene Length

2 3 4 5 Intron Number

Gene length: 3+/- 0.2 kb 3’ SS 5’ SS

0.6

0.8

282

1.0

1.0

273

0.6

0.6

23

0.2

SS ratio

89

Longest

1

Gene length: 2+/- 0.2 kb

0.2

215

0.8

1.0

274

0.8

1.0

0.6

SS ratio

0

Shortest Gene Length

Gene length: 1+/- 0.2 kb

1

SS ratio

D

0.2

0.2

Longest

Lowest Gene FPKM

F

3’SS 5’SS

0.4

1.0 0.8 0.6

SS ratio

0.4

0.6 0.7

Highest

E

C

3’ SS 5’ SS

0

SS ratio

B

3’ SS 5’ SS

0 0.1 0.2 0.3 0.4 0.5

A

Longest

Shortest Gene Length

Longest

Shortest

Gene Intron Length

0.015 0.005

Highest Lowest

Intron

Exon

H3K27me3 Exon

3’ SS

Intron

Exon

Highest Lowest

Exon

Intron

H3K27me3

0

H3K9ac

0

0

H3K4me3

Lowest

0

Exon

0.01

0.01

0.02

Lowest

3’ SS

Intron

0.015

Exon

0.01

Highest

Exon

Figure 3

H3K9ac

0

Exon

Highest

0.01

0.03

Lowest

5’ SS

0.005

Intron

0.03

3’ SS

0.02

0.03

Highest

0.01

0.01

H3K4me3 Exon

Histone modification level

5’ SS

0.02

Lowest

0.02

0.03

Highest

0

Histone modification level

5’ SS

Exon

Exon

Intron

Exon

A

5’ SS

B

3’ SS

Exon

Exon

Exon

1 0.75

Exon Skipping (ES)

A3SS N=1059 r=0.87

A5SS N=597 r=0.9

ES N=489 r=0.89

IR N=887 r=0.8

0.5

mRNA-seq

Exon

Exon

Alternative 5’ splicing site (A5SS) Exon

Exon

Exon

Aternative 3’ splicing site (A3SS) Exon

Exon

Exon

0.25 0 1 0.75 0.5

Intron retention (IR)

0.25

Exon

Exon

0 0

0.25 0.5 0.75

C 1422 bp

5’

1 0

0.25

CB- RNA seq

3’

0.5 0.75

1

1200 bp

5’

3’

At5g53860.1

At4g25500.1

At5g53860.4

At4g25500.3

At5g53860.3

At4g25500.4

At5g53860.2

At4g25500.2

At5g53860.5

CB-RNA

Psi = 0.86

Psi = 0.09

mRNA

Psi = 0.93

Psi = 0.01

Co-transcriptional decision

Psi = 0.75

CB-RNA

Psi = 0.22

mRNA

Post-transcriptional decision

D N=397 1

3’ss 5’ss

p<2.2 e-16 p=1.9 e-12

N=411 3’ss 5’ss

p=7.6 e-16 p<2.2 e-16

N=394

N=425

3’ss 5’ss

p<2.2 e-16 p<2.2 e-16

3’ss 5’ss

p=0.015 p=0.0006

SS ratio

0.75 p=1.7 e-5 p=5.4 e-7

0.5

p=1.9 e-7

p<2.2 e-16

p<2.2 e-16

p=0.0001 p=2.1 e-7

p=0.0001

p<2.2 e-16

p<2.2 e-16 p<2.2 e-16

p<2.2 e-16 p<2.2 e-16 p<2.2 e-16

p<2.2 e-16 p<2.2 e-16

0.25

0

ES

Figure 4

ave

min

max

A3SS

ave

min

max

A5SS

ave

min

max

IR

ave

min

max

B

SS ratio in mRNA

0.6 0.4 0.2 0.0

(0 ~ (0 0.0 9 .0 9~ ] (0 0.1 9 .1 9~ ] (0 0.2 9 .2 9~ ] 0 . (0 .3 39] 9~ 0 (0 . .4 49] 9~ 0 . (0 .5 59] 9~ (0 0.6 .6 9~ 9] 0. 79 (0 .7 9~ ] 0 . (0 .8 89] 9~ 0 . (0 .9 99] 9~ m ax ]

uv-crosslink in vivo

Stringent purification of protein/RNA complex & Adaptor ligation 5’

RBP

5'ss

F

E

5'ss

3'ss

G RZ-1C Bound Genes (eCLIP)

3’-OH

Exon

P-5’

(1980453)

GFP-RZ-1C (~70 Kd)

3,408bp

genes in rz-1b rz-1c at chromatin GFP

RZ-1C GFP 3’

I

5’

Col-0 (CB-RNA)

GFP only (eCLIP)

P2

0.35

0.07

0.25 0.20 0.15 0.10 0.05 0.00

C 1c l-0 Co b rz- RZ-1 1 rz- GFP

AT4G27000

5,919bp

3’

5’ RIP: P1

RIP At4g27000

*

0.06

0.06 0.05 0.04 0.03 0.02 0.00

C 1c l-0 Co b rz- RZ-1 1 rz GFP P2

0.10

*

Figure 5

*

0.02

0.00

P1

P2

3’

0.25

RIP At1g17220 GFP only GFP-RZ-1C

0.20 IP of % input

0.06 0.04 0.02 0.00

AT1G17220

0.04

Spliced

At1g17220 Int.6 Unspliced/Spliced

GFP only (eCLIP)

*

Unspliced

0.08

RZ-1C GFP (eCLIP)

GFP only GFP-RZ-1C

0.01

Col-0 (CB-RNA)

rz-1b rz-1c (CB-RNA)

3’

Unspliced

P1

*

0.30 At4g27000 Int.2 Unspliced/Spliced

RZ-1C GFP (eCLIP)

Spliced RIP:

rz-1b rz-1c (CB-RNA)

5’

474

Differentiall spliced

IP of % input

5’

Intron (90257)

cDNA adaptor ligation & PCR Sequencing to pinpoint the crosslink site

2572

3'UTR (94026)

At4g27000 Int.1 Unspliced/Spliced

5’ cDNA

8365

5'UTR (130762)

3’ RNA

Reverse Transcription stops at the peptide

H

p = 0.50

Low RNase I High RNase I (1:700) (1:25)

Proteinase K digestion leaves a polypeptide at the crosslink site

3’

p = 0.12

3'ss

AAA

5’

Col-0 rz-1b rz-1c

0.00 0.02 0.04 0.06 0.08

p = 8.0 e-8

0.8

SS ratio in CB-RNA

p < 2.2 e-16

0

3’ SS ratio

D

C

Col-0 rz-1b rz-1c

1.0

5000 10000 15000 20000 25000

Col-0 rz-1b rz-1c

0

Numbers of intron

A

0.15

*

0.10 0.05 0.00

Col-0 rz-1b rz-1c GFP-RZ-1C

*

P1

P2

Log2 (iCLIP/ CB-RNA) 2

0.02

31% (3370)

1%(71)

downstream exon

Bits

5

intron exon

exon intron exon

Exon-intron junction

exon

Bits

E-value: 6.8e-100

exon

intron

Exon-exon junction

Intron

exon

1 0

E-value: 8.8e-067

2 1 0 2

E-value: 3.9e-064

1 0

E-value: 2.6e-056

RZ-1

RZ-1

1.2

With RZ-1C binding in splicing affected unit Without RZ-1C binding in splicing affected unit

1

0.4

Exon1 Exon2

0.6

Exon2

27%

0.8

Exon1

Ratio of units showed differential splicing in rz-1c rz-1b

p = 4.3e-22

p = 0.17

Total

1 0

E-value: 1.0e-083

2

Splicing affected unit

RZ-1

RZ-1

0.3

0.2 0

exon

Exon-intron junction

Exon

Intron

Exon

Exon-exon junction

Total

J

0.125

Less effectively spliced genes in rz-1b rz-1c

Pol II

Other genes

Globle

Local

R

S

-1

0.100

RZ

0.2

p = 0.0004

p = 3.2e-6

0.1

SR

0.075

0.050

RZ-1

-1

SR

RZ

SR

RZ-1

SR

RZ

-1

0.025

SR

0

10

20 30 Intron Number

40

50

SR

1

0.000

RZ-

0.0

p = 3.5e-5 p = 0.007

RZ-1

0.5 0.4

p = 0.003

exon

E-value: 2.5e-109

H

Random unit

0.6

RZ-1

Exon-intron junction

Splicing affected unit Random unit p = 0.0003

p = 2.4e-8

Bits

Bits

Exon-exon junction

p = 4.0e-5

E-value: 6.8e-164

1 0

Bits

Introns that don’t bound by RZ-1C

1 0

E-value: 4.0e-428

2 1

2

Splicing affected unit

p = 0.002

1 0

0 2

0

G

2

2

Exon2

p = 0.77

Introns that bound by RZ-1C

Ratio of units showed RZ-1 binding

Bits

10

Log2 (iCLIP/ CB-RNA)

p < 2.2e-16

−5

CB-RNA reads coverage

0.00 0.05 0.10 0.15 0.20 0.25 0.30

F

Bits

E

Bits

intron

Exon1

upstream exon

1KB

D

Ratio of gene

Genes with XLS in exons only Genes with XLS in introns only Genes with XLS in both introns and exons

68% (7496)

RZ-1

PAS

Exon2

TSS

Exon1

-1KB

I

Numbers of gene bound by RZ-1C

0

0.01

C

Units with intronic XLS Units with only exonic XLS All RZ-1C bound Units

4

B

Gene FPKM

0

Reads coverage

0.03

A

5’

3’

Exon

RZ-1B RZ-1C SR proteins

Globle splicing regulation Local splicing regulation Splicing unaffected intron Splicing affected intron

Figure 6 .