Circular RNAs: Identification, biogenesis and function

Circular RNAs: Identification, biogenesis and function

    Circular RNAs: Identification, Biogenesis and Function Karoline K. Ebbesen, Jørgen Kjems, Thomas B. Hansen PII: DOI: Reference: S187...

371KB Sizes 8 Downloads 175 Views

    Circular RNAs: Identification, Biogenesis and Function Karoline K. Ebbesen, Jørgen Kjems, Thomas B. Hansen PII: DOI: Reference:

S1874-9399(15)00145-5 doi: 10.1016/j.bbagrm.2015.07.007 BBAGRM 907

To appear in:

BBA - Gene Regulatory Mechanisms

Received date: Revised date: Accepted date:

9 April 2015 29 June 2015 8 July 2015

Please cite this article as: Karoline K. Ebbesen, Jørgen Kjems, Thomas B. Hansen, Circular RNAs: Identification, Biogenesis and Function, BBA - Gene Regulatory Mechanisms (2015), doi: 10.1016/j.bbagrm.2015.07.007

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Circular RNAs: Identification, Biogenesis and Function

IP

T

Karoline K Ebbesen1, Jørgen Kjems1, and Thomas B Hansen1*

SC R

1: Department of Molecular Biology and Genetics (MBG) and Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Aarhus, Denmark

NU

*Corresponding Author: Thomas B Hansen

MA

Email: [email protected]

TE

Keywords: circular RNA; circRNA

D

Phone: +45 20733619

CE P

ABSTRACT

Circular RNAs are a novel class of non-coding RNA characterized by the presence of a covalent

AC

bond linking the 3’ and 5’ ends generated by backsplicing. Circular RNAs are widely expressed in a tissue and developmental-stage specific pattern and a subset displays conservation across species. Functional circRNAs have been shown to act as cytoplasmic microRNA sponges and RNA-binding protein sequestering agents as well as nuclear transcriptional regulators, illustrating the relevance of circular RNAs as participants in the regulatory networks governing gene expression. Here, we review the features that characterize circular RNAs, discuss putative circular RNA biogenesis pathways as well as review the uncovered functions of circular RNAs.

1

ACCEPTED MANUSCRIPT INTRODUCTION

Long non-coding RNAs (lncRNAs) are a diverse class of transcripts whose common feature is that

T

they do not function as vehicles for translation of proteins [1]. Instead, lncRNAs function as

IP

regulators of protein coding gene expression [1,2,3]. The modulation mediated by lncRNAs can

SC R

take place at every step in the gene expression pathway from transcription to translation as well as through regulation of resulting protein function and involves a wide range of mechanisms [1,2,3,4]. The mechanisms discovered to date span from lncRNAs serving as guides for regulatory proteins to

NU

lncRNAs that act as molecular scaffolds, thereby facilitating formation of active regulatory complexes. Additionally, lncRNAs can act as target decoys by redirecting binding of either microRNAs (miRNAs) or DNA-/RNA-binding proteins (DBPs/RBPs) from the intended target to

MA

the lncRNA as well as bind to and allosterically modify the function of regulatory proteins [2,4]. Since correct and timely regulation of protein expression is essential for the survival and functionality of the cell, lncRNAs play an important role in the cell.

D

A novel class of lncRNAs recently recognized consists of circular RNAs (circRNAs). Although the

TE

presence of circRNAs in human cells was established more than twenty years ago [5,6,7,8,9,10], the prevalence and abundance of these circRNAs in human cells has only recently been revealed and

CE P

appreciated [11,12,13]. CircRNAs are now known to be expressed in all tissues and eukaryotic organisms investigated thus far [11,12,13,14,15,16,17,18,19].

AC

IDENTIFICATION

circRNA features

circRNAs differ structurally from other lncRNAs in that their 3’ and 5’ ends are not free but covalently joined. This joining takes place at a site flanked by canonical splice signals [11,12] and in order to yield a circRNA a splice donor has to be joined to an upstream splice acceptor. This socalled “backsplice” [11] or “head-to-tail splice” [12] is specific for circRNAs and differ from the regular splicing pattern in which a splice donor is joined to a downstream splice acceptor.. CircRNAs are mainly derived from the exons of protein-coding genes, although they can also arise from intronic, intergenic, UTR-regions, ncRNA loci and from locations antisense to known transcripts [11,12]. Furthermore, multiple circRNAs may arise from the same gene locus, a phenomenon termed alternative circularization, [8,11,13,17] and may comprise only a single exon

2

ACCEPTED MANUSCRIPT or encompass more than one [11,12]. As many multiexonic circRNAs consist exclusively of exonic sequences [12], these circRNAs must be subjected to splicing to remove the intervening introns either before or after circularization. However, a subclass of circRNAs coined exon-intron

T

circRNAs (EIciRNAs) are specified by the presence of a retained intron [20]. Additionally, these

IP

EIciRNAs are characterized by a subcellular localization within the cell nucleus (as also observed

SC R

for intron-retaining mRNA, [21,22]) specifically close to the site of transcription. [20]. This is in contrast to circRNAs in general, which are exported through an uncharacterized pathway from the

NU

nucleus to the cytoplasm [11].

Pipelines for circRNA identification

The reason why circRNAs have evaded recognition until now can be accredited to their structure,

MA

since most molecular techniques used focus on and select specifically for poly-adenylated RNA species as a simple approach to deplete abundant ribosomal RNA. However, in this procedure circRNAs, which do not possess a poly(A) tail, are also lost [16]. Furthermore, non-linear sequence

D

reads in large-scale RNA sequencing studies were viewed as technical artefacts and discarded

TE

during standard processing [16]. Hence, elucidation of circRNA abundance required application of bioinformatic pipelines directed to search specifically for circRNAs in datasets generated from

CE P

deep-sequencing of eukaryotic ribosome depleted RNA [11,12,13]. Already, several different tools have been developed for circRNA identification based on high throughput RNAseq datasets: find_circ [12], MapSplice [23], CIRCexplorer [17], circRNAFinder [24], and CIRI [25]. A

AC

comprehensive review of these pipelines are outside the scope of this review, but generally these new pipelines take advantage of the structural feature that is unique to circRNAs in order to detect them: the presence of a sequence read spanning the site where the 3’ and 5’ ends of the transcripts are covalently joined. Furthermore, as circRNAs are resistant toward the exonuclease RNase R [11,26] due to the lack of free ends, this treatment has been applied both to enrich for circRNAs in RNA-seq samples as well as to verify the circular nature of presumed circRNAs on northern blots or by RT-PCR [11,27].

BIOGENESIS

Although thousands of circRNAs have been identified and annotated in the circRNA repository (circBase, [28]), the biogenesis of circRNAs remain elusive. As previously mentioned, circRNAs

3

ACCEPTED MANUSCRIPT differ structurally from other lncRNAs in that their 3’ and 5’ end are covalently joined. Because canonical splice signals flank the junction site in the examined circRNAs [11,12], the spliceosome has been implicated in the generation of these molecules [29]. Inhibition of the canonical

T

spliceosome using isoginkgetin reduces both circRNA levels as well as the levels of the spliced

IP

linear transcript [30], providing support for a role for the spliceosome in circRNA biogenesis.

SC R

Additional evidence reinforcing that the spliceosome is involved in circRNA biogenesis is derived from the observation that mutation of the backsplicing splice sites resulted in circRNAs generated through the use of cryptic splice sites [29,30] or diminished circRNA biogenesis [29]. CircRNA

NU

expression does not always correlate with the expression level of the linear transcript from which the circRNA is derived [12,14]. This indicates that expression of circRNAs is regulated and that the spliceosome must be able to discriminate between forward splicing, i.e. canonical linear splicing,

MA

and backsplicing. However, the mechanism(s) employed by the spliceosome to achieve such discrimination are still not understood. Nonetheless, elements that seem to enhance circularization have been identified, which has led to three non-exclusive propositions regarding the mechanism by

D

which circularization can be induced (Fig. 1A). CircRNA generation in all three pathways is based

TE

on juxtaposition of the relevant splice sites, but the models differ as to how this proximity is

CE P

achieved.

Biogenesis by cis-acting elements

The first pathway is the intron-pairing driven circularization model (Fig. 1A, Intron pairing-driven

AC

circularization), which argues that pairing between complementary motifs within the introns flanking the exons to be circularized causes a restricted structure that in turn induces circularization due to proximity between the splice sites joined in circularization [11]. Recent research has focused primarily on elucidating the importance of complementary flanking Alu elements present in the intronic regions for circularization, because definitive enrichment of such elements in introns flanking circularized exons compared to linearly spliced exons is observed in genome-wide analyses [11,17]. Detailed examination of the circularization process from three human gene loci producing circular RNA from exons flanked by Alu repeats (ZKSCAN1, HIP3K and EPHB4) has been performed. In this analysis the exon(s) known to circularize and a portion of the flanking introns were incorporated into expression vectors. Subsequently these flanking sequences were consecutively deleted and circRNA production was measured, revealing that the presence of a flanking inverted repeat of only ~30-40 nt is sufficient to promote circularization.

4

ACCEPTED MANUSCRIPT However, the specific sequence and the stability of the base-pairing stretches are important for the induction of circRNA formation. Surprisingly, very stable interactions as well as the presence of a poly(A) stretches and G:U wobble pairs in the inverted repeat region result in reduced

T

circularization [31]. This in part complies with the observation that regions flanking circularized

IP

exons are associated with hyperediting, and that depletion of the editing enzyme ADAR1 correlates

SC R

with higher circRNA expression, indicating that the A-to-I conversion producing a U:I mismatch pair in the RNA secondary structure destabilize the flanking intron interaction and in turn reduces circRNA production [18].

NU

In many loci the circularized exons are flanked by several Alu repeat elements, prompting examination of the effect of having multiple repeat elements in one of the flanking introns. This investigation led to the conclusion that competitive base-pairing between different pairs of Alu

MA

repetitive elements influence the linear-to-circular RNA ratio as the presence of two Alu elements within one intron reduced circRNA formation presumably because this allowed intra-intron pairing to outcompete the circularization-promoting exon-spanning pairing [17].

D

Although Alu elements enhance circRNA formation, induction of circRNA formation does not

TE

depend specifically on the presence of Alu elements per se, but only requires the formation of an inverted repeat. This is evidenced by the fact that both Alu element containing introns and non-Alu

CE P

containing complementary regions promoted circularization to the same extent in a minigene expression vector setup [17]. Furthermore, this conclusion complies with the observed generation of circRNA from sites flanked by non-Alu complementary regions (e.g. GCN1L1 [17]) and the

AC

generation of a functional expression vector for ciRS-7 capable of inducing circRNA production in presence of an ~800 bp inverted repeat without Alu elements [27]. Nonetheless, the presence of inverted repeats is not the only factor influencing circRNA biogenesis, although 38 % of all known C. elegans circRNAs and 9 % of all known human circRNAs can be identified using a prediction of the base-pairing potential across the circle forming exons [18]. However, this also means that the remaining 62 % and 91 % of the known circRNAs in C. elegans and humans respectively seem not to be flanked by introns with a strong base-pairing potential. Hence, the biogenesis of these remaining circRNAs cannot be explained by the intron-pairing pathway and must rely on non-repeat dependent circularization pathways. Additionally, many of the strong complementary motifs were present at sites not known to produce circRNAs. Here only 610 out of the top 20,000 intron pairs predicted to have a potential to base-pair across exons flank a known circRNA [18]. This further indicates that other factors may influence circRNA biogenesis in

5

ACCEPTED MANUSCRIPT addition to the presence of complementary motifs. Similarly, another study on mammalian cells concluded that approximately 20 % of all circRNA loci are flanked by complementary Alu repeats [11], which potentially explains the biogenesis of these circRNAs. However, the biogenesis of the

T

remaining 80 % of the circRNAs may then rely either on the presence of other repeats or on non-

IP

repeat dependent circularization pathways. Moreover, circRNAs identified in Drosophila, which

SC R

lacks Alu elements, show no evidence of enriched duplex formation across circRNA forming exons [24]. Instead the circRNAs identified in this study do have other characteristics; they were flanked by introns that are significantly longer than average [24], which is also the case for many

NU

mammalian circRNAs [11] as well as circRNAs identified in C. elegans [18]. However, at least in C. elegans both intron length and complementary motifs seem to be important characteristics of circRNAs as the demarcation between circRNA-forming exons and linear exons

MA

was significantly improved when a prediction of the base-pairing potential across circRNA producing exons was incorporated into the computational prediction instead of intron length [18], Collectively, this suggests that at least for the predominant subset of Drosophila circles, and

D

perhaps also for a substantial subset of mammalian circRNAs as well as circRNAs produced in C.

TE

elegans, biogenesis pathways independent of inverted repeats must be considered.

CE P

Biogenesis by trans-acting factors

Recent studies have indicated that such additional biogenesis pathways may depend on RBPs. The splicing factors Quaking [19] and Muscleblind [29] are able to induce circularization from genes

AC

containing binding motifs for these RBPs in both of the introns flanking the exons subject to circularization. The proposed mode of action for both Quaking and Muscleblind is that binding of the specific RBP to such flanking intronic sequence motifs brings the circle-forming exons within proximity of each other due to the interaction between the bound RBPs (Fig. 1A, RBP pairing driven circularization) [19,29] hence promoting formation of the circle forming junction. This mechanism is similar to the pathway described for intron-pairing driven circularization, except that RBPs induce the proximity between the splice sites instead of direct basepairing between complementary motifs. However, binding of RBPs could also promote circularization either by stabilizing complementary sequences or by inhibiting canonical splicing. It was demonstrated by Conn et al. that the RBP Quaking (QKI) regulates human circRNA biogenesis, and that 1/3 of the most abundant circRNAs in immortalized human mammary epithelial cells after epithelial-mesenchymal transition (mesHMLE) cells decreased upon QKI

6

ACCEPTED MANUSCRIPT knockdown. Using a minigene reporter assay it was furthermore disclosed that the mechanism of QKI regulation involves binding of QKI to motifs present in both of the intronic regions flanking the circularized exons and that mutation of all QKI binding motifs or individual QKI motifs

T

abolished circRNA production. Additional analyses revealed that QKI knockdown only affected

IP

expression of circRNAs that possessed QKI binding motifs, whereas circRNAs without these motifs

SC R

were unaffected. Furthermore, insertion of QKI binding motifs in both the upstream and downstream intron flanking the central exon in minigene expression vectors was enough to establish circRNA production from genes that do not otherwise give rise to circRNA [19].

NU

Aside from QKI, the RBP MBL has been shown to enhance circularization, in this case from its own locus in Drosophila [29], giving rise to circMbl. This conclusion is supported by the observation that overexpression of MBL in Drosophila S2 cells enhances generation of circMbl

MA

both from the endogenous mbl locus and the mbl expression vector specifically, whereas knockdown of MBL inhibits circMbl formation from these sites with equal specificity. Furthermore, relative circMbl production from the genomic muscleblind locus, in which the second

D

exon is flanked by MBL binding sites, is high in Drosophila fly heads, which have abundant

TE

expression of MBL protein, but is low in the Drosophila S2 cell line, in which MBL is expressed at lower levels. Further examination revealed that MBL may also enhance circRNA expression from

CE P

other loci, as evidenced by increased expression of circLuna upon overexpression of MBL and decreased expression of circHaspin upon MBL knockdown [29].

AC

Another potential way in which RBPs in general could cause formation of circRNAs is by inducing exon skipping. According to this pathway, exon skipping, in which one or more exons of the transcript is skipped and hence spliced out of the transcript, yields an exon-containing lariat (Fig. 1A, Lariat-driven circularization. The lariat-containing skipped exon(s) are now in closer proximity and are recognized and joined by the spliceosome [11]. Hence, the process of circularization itself may depend only on intrinsic features of the exon-containing restricted lariat structure, although exon skipping, which gives rise to these lariats is a highly regulated process that depends on transacting protein factors. However, circRNA formation along this pathway only takes place if splicing precedes the otherwise ensuing debranching of the lariat, as the splice sites necessary for circRNA production will no longer be juxtapositioned following debranching. The circumstances under which such intralariat circRNA splicing would be favored over debranching remain undetermined. Furthermore, whether the primary object of exon skipping is always to generate variants of protein

7

ACCEPTED MANUSCRIPT coding transcript or whether circRNA generation can be the prime purpose of exon skipping is as of yet unsettled. Experimental analyses regarding the correlation between exon skipping and circularization both on

T

single-gene level and whole-genome analysis have revealed that although a number of linear exon

IP

skipped transcripts have an associated circRNA consisting of the exons that have been skipped, this

SC R

is not the case for all transcripts subject to alternative splicing [5,6,7,8,11,32,33]. Therefore, although the lariat-driven circularization mechanism may take place in vivo on a genome-wide

NU

basis, other pathways that generate circRNAs might be equally important.

Regulation of circRNA biogenesis by RBPs is in agreement with an overall low correlation between circRNA and host-gene abundance observed [14,34]. Furthermore, it provides a possible

MA

explanation for the differential circRNA expression seen during development as well as the observed cell-type specific expression pattern of circRNAs [14,19,34].

D

Co- or post-transcriptional biogenesis

TE

In addition to the ongoing discussion on the factors influencing circularization, the transcriptional stage at which circularization takes place is also subject to debate. The fact that circRNAs could be

CE P

detected in samples of nascent RNA isolated from Drosophila fly heads argues that circRNAs are most likely generated co-transcriptionally and that their generation is in competition with formation of canonical linear transcripts [29]. This notion is supported by the observation that circRNA

AC

formation decreased when the efficiency of canonical linear splicing was enhanced by the use of a mutated RNA polymerase II with a slower elongation rate [29]. In opposition to this is the finding that functional 3’ end processing is required for circularization, indicating that circRNA formation may take place post-transcriptionally. This conclusion was based on the observation that disruption of the poly(A) signal abrogated circularization. Circularization was restored upon insertion of the 3’ end processing signal from MALAT1, which does not contain a poly(A) tract, implying that the 3’ end processing specifically and not the presence of a poly(A) tract is essential for circularization [31].

In summary, the biogenesis of circRNAs is still not fully understood although exon skipping, the presence of long introns, complementary regions and binding sites for specific RBPs may support

8

ACCEPTED MANUSCRIPT circularization. Definitive clarification of how these elements regulate circRNA expression as well

T

as whether they can act in combination to enhance circRNA formation to a greater extent is needed.

IP

FUNCTION

SC R

While the abundance of circRNAs is now recognized, the role and biological importance of circRNAs as a class is currently uncharacterized and the potential for functionality has been the issue of debate [12,14,15,16]. However, expression of circRNAs and circRNA isoforms is often

NU

cell-type, tissue and developmental stage specific and a fraction of circRNAs show conservation across species, supporting the notion of circRNAs as functional molecules [12,14,15,19].

MA

Furthermore, exonic sequences known to circularize seem to be more conserved at the third codon position, which is often redundant at protein level compared to linear exons. This is indicative of evolutionary constraints at sequence level and hence suggests potential additional functionalities

circRNAs as sponge or decoys

TE

D

apart from encoding protein [12,15].

CE P

Although the function of circRNAs at large remains unidentified, the role of two circRNAs, ciRS7/CDR1as and Sry circRNA, has been elucidated. Both ciRS-7/CDR1as and Sry circRNA act as inhibitors of miRNA function by binding (“sponging”) a specific miRNA, in these instances miR-7 and miR-138 respectively, hence acting as miRNA target decoys (Fig. 1B, miRNA sponge) [12,27].

AC

ciRS-7/CDR1as harbours more than 70 miR-7 binding sites and through tethering of this miRNA to ciRS-7/CDR1as, miR-7 activity is efficiently reduced. CiRS-7/CDR1as is known to be exceptionally abundant in brain throughout placental mammals, particularly in the cerebellum, but is detectable in most tissues [35]. The exact biological relevance of expressing a miRNA decoy to this extent is currently unknown. Sry circRNA, another highly expressed circRNA found in mouse testis, was also demonstrated to act as an inhibitor of miRNA activity using in vitro luciferase reporter assays, in this case of miR138, which it harbours 16 putative binding sites for [27]. However, only few circRNAs contain a substantial number of miRNA binding sites for a single miRNA, and it is currently debated whether miRNA inhibition is a general (but not exclusive) feature of circRNAs [15,16]. Supporting the notion of circRNAs as miRNA decoys is the recent observation that circRNA exons tend to be depleted of polymorphisms specifically at predicted

9

ACCEPTED MANUSCRIPT miRNA binding sites [36]. In addition, highly conserved miRNA binding sites overlap with circRNA production in Drosophila [24], suggesting that circRNAs as miRNA sponges (or at least miRNA interactors) is a more widespread phenomenon. Whether circRNA abundance or target site

T

numbers within the circRNA comply with an expected effect on miRNA activity awaits

IP

clarification. This is also a controversial matter for competing endogenous RNA (ceRNAs) [37],

SC R

which also bind miRNAs and function to titrate miRNAs away from the intended targets. Although both the linear ceRNAs and circRNAs regulate miRNA function, the efficiency with which this is carried out may differ. Hypothetically, circRNAs would be the more efficient sponge of the two as

NU

the covalently closed structure of circRNAs leads to a higher transcript stability [11] presumably due to protection from exonuclease degradation, allowing circRNAs to accumulate as well as maintain the regulatory function for a longer period of time. Furthermore, as circRNAs lack both a

MA

5’ cap and a 3’ poly(A) tract, this may impart resistance against the miRNA mediated deadenylation, decapping and degradation normally observed upon miRNA binding to mRNAs [15,38], thereby also increasing the efficiency of circRNAs as sponges compared to ceRNAs.

D

Additionally, the intrinsic resistance against miRNA-mediated destabilization emphasizes that

TE

circular miRNA sponges are not miRNA targets but instead miRNA regulators, whereas ceRNAs

regulation.

CE P

by virtue of their non-circular structure are themselves sensitive toward miRNA-mediated

Circular RNAs can also function as protein decoys (Fig. 1B, RBP sponge). With regard to this

AC

aspect, the circRNA coined circMbl that is derived from the mbl locus in Drosophila harbours binding sites for the MBL protein itself [29]. As described above, MBL is able to induce circMbl production. Generation of this circRNA consequently renders the pre-mRNA non-productive, limiting further production of MBL protein. CircMbl tethers and decoys MBL which prevents further generation of circMbl and instead reactivates productive mbl mRNA production. Therefore, circMbl seems to be an intricate entity of an MBL auto-regulatory circuit. Furthermore, circMbl also encompasses highly conserved miRNA binding sites [24] and therefore circMbl could in fact have multi-faceted roles in the Drosophila brain.

circRNAs as transcriptional regulators A novel subclass of circRNAs, EIciRNAs was recently shown to enhance transcription of the gene from which they were derived through interaction with U1 snRNP and RNA Polymerase II in the

10

ACCEPTED MANUSCRIPT promoter region of the host-gene (Fig. 1B, Regulator of transcription). The interaction with U1 snRNP was direct and mediated by the presence of an intact splice donor sequence. This interaction was essential for the function of EIciRNAs as decreased mRNA production from the host-gene was

T

observed upon depletion of the EIciRNA in a manner dependent on U1 snRNP. Thus, these nuclear

IP

localized circRNAs have the potential to induce host-gene expression in cis. Given that EIciRNAs

SC R

are not located exclusively at the site of transcription, they may potentially also act in trans to regulate other genomic loci. These observations resemble the observations obtained when studying the function of stable lariat structures [39]. Here, long-lived lariats were also shown to regulate gene

NU

expression in cis positively. Perhaps this is an example of two distinct but similar RNA species, lariats and circles, seemingly involved in same regulatory feedback mechanism. Whether the closed topology associated with both lariats and circles only serve as a protective feature against exo-

MA

nucleolytic decay or whether it is essential for the regulatory potential is currently unknown.

In summary, the uncovered examples of circRNAs contributing to the regulatory network governing

D

protein coding gene expression by acting as miRNA target decoys, RBP sponges and transcriptional

TE

regulators exemplifies a great biological potential for circRNA related functionalities. This is further supported by the conserved nature of circRNA expression and the tissue-specific and

CE P

regulated abundance. The currently disclosed functions of circRNAs mirror the functions of their linear non-coding counterparts [2,4], and putative other functions of circRNAs may similarly parallel the other biological activities uncovered for linear lncRNA [16,40,41]. Uncovering these

AC

additional functions (if any) and understanding these functionalities is the key topic in the circRNA field of research in the years to come.

Acknowledgements This work was supported by the Lundbeck Foundation (R151-2013-14555 to T.B.H) and the Danish Council for Independent Research.

Legends Figure 1: Biogenesis and function of circRNAs. A) Three pathways generating circRNAs have been suggested so far, all of which are able to give rise to both circular RNAs (circRNAs)

11

ACCEPTED MANUSCRIPT consisting exclusively of coding sequences and circRNAs in which an intron has been retained (exon-intron ciRNAs, EIciRNAs). One model of biogenesis (1) involves the presence of complementary sequence motifs in the introns flanking the exons to be circularized. Direct base-

T

pairing between these motifs position the appropriate splice sites necessary for circularization

IP

within close proximity. In another model (2) interaction between RBPs bound to sequence motifs in

SC R

both introns flanking the exon(s) to be circularized facilitate the head-to-tail end-joining. In both cases, either a circRNA or an EIciRNA depending on the retention of internal introns (if any) is produced. In a third model of biogenesis (3) exon skipping leads to mRNA consisting of exons 1

NU

and 4 as well as a lariat structure containing the skipped exons 2 and 3. Due to the induced proximity between the splice donor of exon 3 and the splice acceptor of exon 2, intralariat splicing involving these two splice sites leads to the generation of an EIciRNA and additional splicing of the

MA

second intron produces a circRNA. Whereas pathway 3 leads to generation of a linear product in addition to a circRNA, this linear product does not necessarily arise in pathway 1 and 2. B) The known functions of circRNAs consist of miRNA regulation through sequestration of AGO-miRNA

D

complexes if the circRNA contains miRNA binding sites (4) or as a regulator of RBPs, as observed

TE

for Muscleblind (MBL) through binding of MBL to sites present in the circRNA (5). In contrast, exon-intron ciRNAs are retained in the nucleus and function to promote transcription of their host

CE P

genes through direct interaction with U1 snRNP (U1) mediated by the 5’ splice site within the retained intron (6). The exon-intron ciRNA-U1 complex recruits RNA polymerase II (RNA pol II)

References

AC

to the promoter of the host gene which stimulates transcription initiation.

[1] L. Yang, J.E. Froberg, J.T. Lee, Long noncoding RNAs: fresh perspectives into the RNA world, Trends Biochem Sci 39 (2014) 35-43. [2] K.C. Wang, H.Y. Chang, Molecular mechanisms of long noncoding RNAs, Mol Cell 43 (2011) 904-914. [3] J.-H. Yoon, K. Abdelmohsen, M. Gorospe, Posttranscriptional Gene Regulation by Long Noncoding RNA, J. Mol. Biol. (2012). [4] M. Guttman, J.L. Rinn, Modular regulatory principles of large non-coding RNAs, Nature 482 (2012) 339-346. [5] A. Surono, Y. Takeshima, T. Wibawa, M. Ikezawa, I. Nonaka, M. Matsuo, Circular dystrophin RNAs consisting of exons that were skipped by alternative splicing, Hum Mol Genet 8 (1999) 493-500. [6] C. Cocquerelle, B. Mascrez, D. Hétuin, B. Bailleul, Mis-splicing yields circular RNA molecules, FASEB J 7 (1993) 155-160.

12

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

MA

NU

SC R

IP

T

[7] P.G. Zaphiropoulos, Circular RNAs from transcripts of the rat cytochrome P450 2C24 gene: correlation with exon skipping, Proc Natl Acad Sci U S A 93 (1996) 6536-6541. [8] C.E. Burd, W.R. Jeck, Y. Liu, H.K. Sanoff, Z. Wang, N.E. Sharpless, Expression of linear and novel circular forms of an INK4/ARF-associated non-coding RNA correlates with atherosclerosis risk, PLoS Genet 6 (2010) e1001233. [9] B. Capel, A. Swain, S. Nicolis, A. Hacker, M. Walter, P. Koopman, P. Goodfellow, R. LovellBadge, Circular transcripts of the testis-determining gene Sry in adult mouse testis, Cell 73 (1993) 1019-1030. [10] J.M. Nigro, K.R. Cho, E.R. Fearon, S.E. Kern, J.M. Ruppert, J.D. Oliner, K.W. Kinzler, B. Vogelstein, Scrambled exons, Cell 64 (1991) 607-613. [11] W.R. Jeck, J.A. Sorrentino, K. Wang, M.K. Slevin, C.E. Burd, J. Liu, W.F. Marzluff, N.E. Sharpless, Circular RNAs are abundant, conserved, and associated with ALU repeats, RNA 19 (2013) 141-157. [12] S. Memczak, M. Jens, A. Elefsinioti, F. Torti, J. Krueger, A. Rybak, L. Maier, S.D. Mackowiak, L.H. Gregersen, M. Munschauer, A. Loewer, U. Ziebold, M. Landthaler, C. Kocks, F. le Noble, N. Rajewsky, Circular RNAs are a large class of animal RNAs with regulatory potency, Nature 495 (2013) 333-338. [13] J. Salzman, C. Gawad, P.L. Wang, N. Lacayo, P.O. Brown, Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types, PLoS One 7 (2012) e30733. [14] J. Salzman, R.E. Chen, M.N. Olsen, P.L. Wang, P.O. Brown, Cell-type specific features of circular RNA expression, PLoS Genet 9 (2013) e1003777. [15] J.U. Guo, V. Agarwal, H. Guo, D.P. Bartel, Expanded identification and characterization of mammalian circular RNAs, Genome Biol 15 (2014) 409. [16] W.R. Jeck, N.E. Sharpless, Detecting and characterizing circular RNAs, Nat Biotechnol 32 (2014) 453-461. [17] X.O. Zhang, H.B. Wang, Y. Zhang, X. Lu, L.L. Chen, L. Yang, Complementary sequencemediated exon circularization, Cell 159 (2014) 134-147. [18] A. Ivanov, S. Memczak, E. Wyler, F. Torti, H.T. Porath, M.R. Orejuela, M. Piechotta, E.Y. Levanon, M. Landthaler, C. Dieterich, N. Rajewsky, Analysis of intron sequences reveals hallmarks of circular RNA biogenesis in animals, Cell Rep 10 (2015) 170-177. [19] S.J. Conn, K.A. Pillman, J. Toubia, V.M. Conn, M. Salmanidis, C.A. Phillips, S. Roslan, A.W. Schreiber, P.A. Gregory, G.J. Goodall, The RNA Binding Protein Quaking Regulates Formation of circRNAs, Cell 160 (2015) 1125-1134. [20] Z. Li, C. Huang, C. Bao, L. Chen, M. Lin, X. Wang, G. Zhong, B. Yu, W. Hu, L. Dai, P. Zhu, Z. Chang, Q. Wu, Y. Zhao, Y. Jia, P. Xu, H. Liu, G. Shan, Exon-intron circular RNAs regulate transcription in the nucleus, Nat Struct Mol Biol 22 (2015) 256-264. [21] Q. Xu, D. Walker, A. Bernardo, J. Brodbeck, M.E. Balestra, Y. Huang, Intron-3 retention/splicing controls neuronal expression of apolipoprotein E in the CNS, J Neurosci 28 (2008) 1452-1459. [22] S. Sun, Z. Zhang, R. Sinha, R. Karni, A.R. Krainer, SF2/ASF autoregulation involves multiple layers of post-transcriptional and translational control, Nat Struct Mol Biol 17 (2010) 306312. [23] K. Wang, D. Singh, Z. Zeng, S.J. Coleman, Y. Huang, G.L. Savich, X. He, P. Mieczkowski, S.A. Grimm, C.M. Perou, J.N. MacLeod, D.Y. Chiang, J.F. Prins, J. Liu, MapSplice: accurate mapping of RNA-seq reads for splice junction discovery, Nucleic Acids Res 38 (2010) e178.

13

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

MA

NU

SC R

IP

T

[24] J.O. Westholm, P. Miura, S. Olson, S. Shenker, B. Joseph, P. Sanfilippo, S.E. Celniker, B.R. Graveley, E.C. Lai, Genome-wide analysis of drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation, Cell Rep 9 (2014) 1966-1980. [25] Y. Gao, J. Wang, F. Zhao, CIRI: an efficient and unbiased algorithm for de novo circular RNA identification, Genome Biol 16 (2015) 4. [26] H. Suzuki, Y. Zuo, J. Wang, M.Q. Zhang, A. Malhotra, A. Mayeda, Characterization of RNase R-digested cellular RNA source that consists of lariat and circular RNAs from pre-mRNA splicing, Nucleic Acids Res 34 (2006) e63. [27] T.B. Hansen, T.I. Jensen, B.H. Clausen, J.B. Bramsen, B. Finsen, C.K. Damgaard, J. Kjems, Natural RNA circles function as efficient microRNA sponges, Nature 495 (2013) 384-388. [28] P. Glažar, P. Papavasileiou, N. Rajewsky, circBase: a database for circular RNAs, RNA (2014). [29] R. Ashwal-Fluss, M. Meyer, N.R. Pamudurti, A. Ivanov, O. Bartok, M. Hanan, N. Evantal, S. Memczak, N. Rajewsky, S. Kadener, circRNA Biogenesis Competes with Pre-mRNA Splicing, Mol Cell 56 (2014) 55-66. [30] S. Starke, I. Jost, O. Rossbach, T. Schneider, S. Schreiner, L.H. Hung, A. Bindereif, Exon circularization requires canonical splice signals, Cell Rep 10 (2015) 103-111. [31] D. Liang, J.E. Wilusz, Short intronic repeat sequences facilitate circular RNA production, Genes Dev (2014). [32] P.G. Zaphiropoulos, Exon skipping and circular RNA formation in transcripts of the human cytochrome P-450 2C18 gene in epidermis and of the rat androgen binding protein gene in testis, Mol Cell Biol 17 (1997) 2985-2993. [33] S. Kelly, C. Greenman, P.R. Cook, A. Papantonis, Exon Skipping Is Correlated with Exon Circularization, J Mol Biol (2015). [34] X. You, I. Vlatkovic, A. Babic, T. Will, I. Epstein, G. Tushev, G. Akbalik, M. Wang, C. Glock, C. Quedenau, X. Wang, J. Hou, H. Liu, W. Sun, S. Sambandan, T. Chen, E.M. Schuman, W. Chen, Neural circular RNAs are derived from synaptic genes and regulated by development and plasticity, Nat Neurosci (2015). [35] T.B. Hansen, E.D. Wiklund, J.B. Bramsen, S.B. Villadsen, A.L. Statham, S.J. Clark, J. Kjems, miRNA-dependent gene silencing involving Ago2-mediated cleavage of a circular antisense RNA, EMBO J 30 (2011) 4414-4422. [36] L.F. Thomas, P. Sætrom, Circular RNAs are depleted of polymorphisms at microRNA binding sites, Bioinformatics 30 (2014) 2243-2246. [37] M. Jens, N. Rajewsky, Competition between target sites of regulators shapes posttranscriptional gene regulation, Nat Rev Genet 16 (2015) 113-126. [38] E. Huntzinger, E. Izaurralde, Gene silencing by microRNAs: contributions of translational repression and mRNA decay, Nat Rev Genet 12 (2011) 99-110. [39] Y. Zhang, X.O. Zhang, T. Chen, J.F. Xiang, Q.F. Yin, Y.H. Xing, S. Zhu, L. Yang, L.L. Chen, Circular intronic long noncoding RNAs, Mol Cell 51 (2013) 792-806. [40] T.B. Hansen, J. Kjems, C.K. Damgaard, Circular RNA and miR-7 in cancer, Cancer Res 73 (2013) 5609-5612. [41] M.W. Hentze, T. Preiss, Circular RNAs: splicing's enigma variations, EMBO J 32 (2013) 923925.

14

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

MA

NU

SC R

IP

T

Figure 1

15

ACCEPTED MANUSCRIPT Highlights

CE P

TE

D

MA

NU

SC R

IP

T

The new non-coding RNA family member… We describe the current knowledge of circRNAs The existing models of circRNA biogenesis and function are discussed

AC

  

16