Computational approaches for detection and quantification of A-to-I RNA-editing

Computational approaches for detection and quantification of A-to-I RNA-editing

Accepted Manuscript Computational approaches for detection and quantification of A-to-I RNA-editing Yishay Pinto, Erez Y. Levanon PII: DOI: Reference:...

990KB Sizes 0 Downloads 55 Views

Accepted Manuscript Computational approaches for detection and quantification of A-to-I RNA-editing Yishay Pinto, Erez Y. Levanon PII: DOI: Reference:

S1046-2023(18)30167-1 https://doi.org/10.1016/j.ymeth.2018.11.011 YMETH 4588

To appear in:

Methods

Received Date: Revised Date: Accepted Date:

15 August 2018 14 November 2018 16 November 2018

Please cite this article as: Y. Pinto, E.Y. Levanon, Computational approaches for detection and quantification of Ato-I RNA-editing, Methods (2018), doi: https://doi.org/10.1016/j.ymeth.2018.11.011

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Computational approaches for detection and quantification of A-to-I RNAediting Yishay Pinto and Erez Y. Levanon The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, 5290002, RamatGan, Israel. Key Words: RNA-editing; ADAR, RNA-sequencing;

Abstract Adenosine deaminases that act on RNA (ADARs) catalyze adenosine-to-inosine (A-to-I) RNA editing in double-stranded RNA. Such editing is important for protection against false activation of the immune system, but also confers plasticity on the transcriptome by generating several versions of a transcript from a single genomic locus. Recently, great efforts were made in developing computational methods for detecting editing events directly from RNA-sequencing (RNA-seq) data. These efforts have led to an improved understanding of the makeup of the editome in various genomes. Here we review recent advances in editing detection based on the data available to the researcher, with emphasis on the principles underlying the various methods and the limitations they were designed to overcome. We also discuss the available various methods for analyzing and quantifying editing levels. This review collects and organizes the available approaches for analyzing RNA editing and discuss the current status of the different A-to-I detection methods with possible directions for extending these approaches.

Introduction Members of the adenosines deaminases acting on RNA (ADAR) family of enzymes can modify adenosines to inosines by catalyzing adenosine deamination1–3. Editing of endogenous dsRNA by ADAR enzymes was found to be required to prevent activation of the innate immune system4–7. Inosines (I) are recognized as guanosines (G) both by the cellular machineries8 (such as the translational machinery and the spliceosome) and also by sequencing technologies. Based on this property, several research groups have developed various computational and biochemical methods in order to identify A-to-I editing sites directly from high-throughput sequencing data9,10. The aim of this review is to outline the main advances, uses, limitations, and challenges of these methods. A-to-I editing sites can be identified as A-to-G mismatches in the RNA relative to the genomic DNA11–14. Although detecting A-to-G mismatches—by analyzing RNA or transcriptome sequencing data— may seem a simple computational task, straightforward approaches screening for cases where the RNA harbors “G” at the same position as an “A” in the DNA, ended up with a high rate of false positives. The accuracy of the detection can be easily measured by the fraction of A-to-G mismatches relative to all the 12 possible DNA-RNA mismatches (Figure 1). A reliable A-to-I editing screen should yield a clearly predominant amount of A-to-G mismatches over all 11 other possible mismatches. The low signal-to-noise ratio of such direct approaches can be due to various issues other than editing that can give rise to sequence differences between the RNA reads and the aligned genome/transcriptome reference. These include: (i) sequencing errors, (ii) erroneous alignment of the RNA reads to the reference genome/transcriptome, (iii) various types of genomic polymorphisms, (iv) somatic mutation and (v) spontaneous chemical changes in the RNA or DNA15. Additionally, the power of editing detection can be enhanced through different methods of library preparation and RNA-degradation prevention16. In order to improve the accuracy of editing detection, several research groups have developed sophisticated methods involving statistical modeling, several filtering and/or alignment steps and integration of additional genomic information9. To date, millions of RNA-editing sites were collected and published in different databases allowing access to the growing information on RNA-editing in different organisms, tissues and genomic regions (Table 1).

1. Detection of editing without previous knowledge (de novo detection) 1.1 Computational methods A scenario in which both RNA and DNA from the same biological source can be sequenced at high coverage represents the ultimate starting point for editing detection. The availability of such data can eliminate most of the sources of false editing prediction so that detection of editing is easy and accurate. Having such matched DNA sequencing data facilitates the filtering out of common and rare mutations, which would appear as mismatches between the DNA and the reference genome. However, due to high cost and lack of sampling availability, researchers generally have to contend with a less optimal situation (e.g. having only RNA-sequencing data). As a result, various computational approaches for editing detection have been developed with the aim of matching each of the scenarios for data availability with a unique, “tailored” computational tool. A list of implemented tools to detect RNA-editing can be found in Table 2. 1.1.1

DNA-RNA methods

The first type of method to identify RNA-editing utilizes RNA and matched DNA sequencing data from the same individual in order to reduce the identification of single-nucleotide polymorphisms as editing events17. Moreover, in order to reduce false identification of somatic mutations as editing events, the optimal setting will be to have both DNA and RNA-seq data from the same tissue of the same individual. In general, such methods utilize the DNA-seq data to filter-out any nonhomozygous genomic sites followed by additional basic filtering procedures that are usually applied to the RNA-editing candidates17–21 (Figure 2A). A list of suggested filtering steps is provided in Table 3. The most effective procedures depend on defining the cutoffs for a minimal coverage of reads at a specific position and the minimal sequencing quality for each base in order to increase the reliability of the possible editing signal. In addition to these basic filtering steps several methods are used to carefully analyze the editing sites around splice-junctions17,19,21. These regions in the genome are particularly prone to misalignment because of the need for the alignment tools to open gaps within the RNA reads in order to represent the introns. Another common technique is to use several sequence alignment steps, employing different aligners in order to minimize incorrect alignment errors17,19,21. Finally, since editing frequencies vary dramatically in different genomic regions, a set of “personalized” computational steps can be applied. The most studied example employs different additional criteria for sites within or outside ALU repeats17.We

discuss the unique characteristics of editing in ALU repeats and customized methods for such detection in section 1.1.5.1. 1.1.2

Analysis with only RNA-seq

Unfortunately, matched DNA and RNA sequencing from the same biological sample is not always available. Hence, a number of approaches have been developed in order to identify RNA editing sites in the absence of DNA sequencing data. Probably the major challenge in identifying RNA editing sites using RNA-seq data alone is to discriminate between RNA editing sites and genomic single nucleotide variation (SNV) events (Figure 2B). Since common single nucleotide polymorphisms (SNPs) are well documented and can easily be removed from further processing, it is the identification and elimination of rare genomic polymorphisms that becomes a crucial step in the absence of genome sequencing data. This challenge can be addressed by performing parallel analyses of multiple RNA-seq datasets, which may be derived either from different samples of the same individual or from multiple individuals22,23. Each RNA-seq is analyzed separately for editing candidates. Commonly known SNPs are removed, and basic filtering steps are applied as discussed previously. Where RNA-seqs from multiple individuals are analyzed, the fact that RNA-editing should appear in many samples, while rare genomic polymorphisms do not, allows to distinguish between the two options. In contrast, when analyzing mismatches from multiple RNA-seqs derived from the same individual, rare genomic polymorphism should generally appear in all samples at levels that reflect heterozygosity (50%) or, in some cases, homozygosity (100%).

Proximate

genomic mutations should produce a haplotype structure in the same individual and be highly correlated. Editing events however are usually more sporadic and should only be partially correlated, if at all (Figure 2B). This different correlation can be used to identify RNA-editing events from a single RNA-seq and utilize the allelic linkage between genomic variations in order to differentiate between SNVs and RNA editing events24. Currently, most of the RNA-editing screens utilize variations of the “RNA-seq alone” approach and have led to the discovery of a large number of human editing sites. As we mentioned, RNA-editing detection is influenced by the library preparation, batch effects and other confounding factors16,25. When designing an experiment of more than one sample, using uniform library preparation and sequencing design methods for all samples and sequencing in the same run/machine/center, will help to reduce such batch effects. Hence, when analyzing several public RNA-seq, retrieving all of them from the same study is preferred than comparing data from different and unrelated studies.

1.1.3 Detection of RNA-editing in the absence of a reference genome The methods discussed so far all require the alignment of the RNA- and/or DNA-seq reads to a reference genome. However, for some species neither transcriptome nor genome references are available, making the problem of detecting RNA editing much more challenging26. In such cases, both DNA and RNA sequencing data from the same samples are required since there is no prior knowledge of where the SNPs are located. However, as most RNA editing screens focus on the transcribed regions, it can be considered sufficient to assemble the transcriptome from the RNA reads, obviating the highly demanding effort of assembling the whole genome from the DNA reads. For example, two recent studies27,28 successfully detected A-to-I editing sites in different reference-less Cephalopods using both RNA- and DNA- sequencing data. First, a de novo transcriptome was assembled from the RNA reads using the Trinity de novo assembly package29. Next, coding sequences were identified (Figure 2C) and both RNA and DNA reads were aligned to the assembled transcriptome. RNA-DNA mismatches were called, while mismatches within the DNA reads were assigned as SNVs and removed from the pool of possible RNA editing candidates. Only positions where a consensus in the DNA reads met variations in the RNA reads were further processed and filtered. As the next stage, a binomial test was performed in order to distinguish between low level editing sites and sequencing errors30. This approach produced a clean signal for A-to-G mismatches, an indication of the robustness of the approach. Interestingly, through this approach, Cephalopods were shown to harbor a large number of editing events in the coding regions that lead to high diversity at the proteome level although the functional impact of this observation is still unknown28. 1.1.4 Lateral methods to detect RNA editing Careful alignment of RNA reads to the proper genomic location is a key step in reducing systematic alignment errors. Genomic regions that are particularly prone to such alignment errors include duplications, splicing, repeats, and pseudogenes, which are particularly challenging. Misalignment of reads to these variable regions could be mistaken for multiple editing sites and for this reason, most detection protocols allow only a limited number of possible mismatches between the RNA reads and their target genome sequence. Thus although ADARs are known to have the ability to deaminate clusters of several proximate adenosines, which create hyper-edited RNA molecules, standard detection methods will fail to discover this editing, particularly in heavily edited shortreads. The need to resolve the issue, has prompted the development of a specific computational

protocol that does allow the heavily edited reads to be correctly mapped31,32. The rationale behind this approach is to focus on reads that cannot be aligned to the genome under stringent alignment criteria, and realign them to the genome after reducing the complexity of both DNA and RNA sequences. This can be done by transforming the RNA/DNA sequence information to a three letter code (e.g. by an A→G transformation). In this way, mismatches in excessive (“hyper”)-edited RNAs due to A-to-I editing are masked and standard alignment tools can be employed to detect the genomic origins of these sequences. If a transformed RNA successfully aligns to the transformed genome, the original sequences can then be recovered and the mismatches examined. A particularly large number and density of A-to-G mismatches indicates that the RNA was ultraedited. As expected, almost all hyper-edited reads are of the A-to-G type, while very few reads with excessive mismatches of other types have been found. Two additional methods33,34, utilizing a similar rationale, have also been introduced to identify hyper-edited regions, which seem to be a common occurrence. Currently, over 200 million such hyper-edited reads have been found in human genetic material 35, indicating that hyper editing reads represent > 0.1% of the RNA-seq reads in human36. In addition, quantities of such hyper editing events have now been found across metazoans37 showing that they are not limited to humans. 1.1.5 Tailored methods for detection of RNA editing 1.1.5.1 Alu elements Editing is not distributed evenly across the genome because the favored targets of ADARs consist of a long double-strand (dsRNA) molecule. This structure can be formed by adjacent inverted repeats such as mobile-elements that reside within the introns or UTRs of a transcript. A notable and heavily studied example involves editing in the highly abundant primate-specific Alu elements. Each inverted Alu element is about 300 bp long, and serves as a ready substrate for the formation of the type of long dsRNA moieties that represent ADAR substrates. Indeed, the vast majority of human editing sites are located within Alu elements, even though most of these sites are edited at very low levels. Most of the Alu sites are edited by the primary editor of repetitive elementsADAR138. The high number of editing sites in these areas prompted the development of a unique method for identifying editing sites in Alu elements and calculating a global measurement of editing in Alu elements in a given sample39. The latter will be discussed in chapter 2. The key feature of these methods is the ability to detect regions that exhibit a high density of editing sites even though most of them are edited at very low levels. By these methods, the proximate sites will

be identified as reliable editing sites even though each individual site alone does not meet the criteria for minimal editing levels and/or coverage required for most other approaches. Ultra-deep sequencing of dozens of Alu repeats has provided an estimation of over 100 million Alu editing sites in the human genome39. 1.1.5.2 MicroRNAs MicroRNAs (miRNAs) are short RNAs (typically 22nt long) that play a key role in transcription homeostasis. During their maturation, miRNAs form dsRNA structures which make them natural candidates for RNA-editing. Indeed, pre- and mature-miRNAs were both found to be edited40 . A number of region-based methods have been introduced in response to the challenge of distinguishing true RNA-editing sites within mature-microRNAs, from short miRNA-seq reads30,41–44. Such methods usually consist of three main steps. The first step involves preprocessing of small-RNA reads using tailored methods of adapter trimming, as adapter traces in short reads severely disturb their proper alignment. This step is followed by enrichment of the library for the expected read length and filtering out of low quality reads. The second step is alignment to the genome30,41, miRNAome42 or both 43,44. Recommended aligners and parameters for editing detection can be found in Ziemann et al.45. The last step is to obtain the read counts for each of the miRNA nucleotides, followed by applying basic filtering and/or statistical models to distinguish between miRNA editing events and sequencing errors. Although only dozens of adenosines within mature miRNAs were found to be edited, and are then generally only weakly edited, editing events in microRNAs were found to have an important functional effect on miRNA regulation in several cases46–48. 1.1.6 Additional genomic regions Editing can potentially occur in all transcribed regions that form double-stranded RNA (dsRNA). Here, we will briefly summarize some of these edited regions even though there are no specific editing detection methods associated with them. During the biogenesis of Circular RNAs (circRNAs) dsRNA structure may be formed in the flanking introns and indeed, as might be expected, introns surrounding circRNA have been identified as ADAR targets and are enriched for RNA-editing events49. Similarly, many lncRNAs may also be subjected to RNA-editing. Edited lncRNAs were detected by comparing RNA and DNA reads from the same sample50, by comparing RNA-seq data from different C. elegans strains, accompanied by analysis of knockout strains51 ,

or by intersection of databases52. Foreign viral-RNA can also be the target of RNA-editing by the host ADARs. Evidence for editing in viral-RNAs (mainly from target-specific sequencing methods) was found in several families of viruses and in a variety of different hosts53,54 .

1.2 Chemical-based methods While most of the methods for detecting RNA-editing are based on computational analysis of RNA-seq, and on the base-pairing of inosine to cytidine, there are also a number of chemical based methods that were recently developed. One such approach, iSeq55, is based on inosine-specific cleavage and subsequent sequencing of inosine containing fragments. In another method, ICEseq56, RNA is treated with acrylonitrile that is added to the N1 position of inosine to form cyanoethylinosine, which causes truncation of the reverse transcription of the treated RNAs. When followed by purification of the untreated RNAs, this method allows editing to be identified only in the untreated RNAs. Finally, ADAR1 CLIP-seq57 can also be employed in order to identify the transcriptome-wide binding regions of ADAR1.

2. Quantifying overall editing in a sample Measuring the overall editing activity in a sample is an important part of studying the roles of RNA editing. Developing a properly normalized and comparable measurement of editing levels in RNAsequencing data allow researchers to compare the editing activity in different cells, tissues, traits, or treatments. Several methods have therefore been developed in order to address this need. The first method is based on the observation that the vast majority of the editing sites in human genomes are located in Alu elements. Even though most of the Alu sites are edited in low levels, they can be detected with a very high signal-to-noise ratio. Hence, a weighted average of all Alu editing events (Alu editing index; AEI) can be used to provide a reliable measurement of editing activity in a sample58. For example, the AEI in figure 2E will be 7/33 (3 ‘A’s within a coverage of 11 reads where 7 of these reads are edited and appear as ‘G’s). Following a similar rationale, since hyper-edited RNA molecules are also detected with high accuracy even in genomes that lack sufficient polymorphism data, a normalized number of hyper-edited reads in an RNA-seq file can also be used as an editing index. A high correlation was found between these two indices58.

Alternatively, some studies have also used an averaged measure where the editing levels are accumulated from a predefined set of multiple sites38,59.

3. Measuring editing levels of known sites The aim of many studies has been to identify sites differentially-edited in two (or more) sets of samples by analyzing RNA-seq data. In such cases, de novo detection methods that utilize relatively strict alignment and filtering parameters do not allow measurement of editing levels per editing site per sample. In order to resolve this issue, researchers have endeavored to determine the fraction of edited transcripts of a site (editing levels) by dividing the number of the ‘G’containing transcripts that map to the site, by the total number of transcripts mapped to the position. For example, the editing levels of the leftmost editing site in figure 2E is 4/11 as we found evidence for editing in 4 reads out of 11 reads. In this kind of analysis, the first step is to define the set of editing sites of interest. This can be based on genomic properties (e.g. recoding sites60 ), evolutionary conservation61 , genes of interest 62 , etc. The next step is to pile up the aligned RNAseq so that it is organized in a position-based manner, followed by base calling per position. The editing levels can then be measured easily for each site of interest by calculating the ratio of ‘G’s out of the number of reads. This procedure can be accomplished by using the REDItoolKnown tool, which is a part of the REDItools suite19,63 , or by using SAMtools mpileup64 and in-house scripts. This approach had been used to detect several editing sites with altered levels of editing in various types of cancer65. A notable example is the editing site within the AZIN gene66, which was shown to promote cancer. The accuracy of measuring the editing levels of a site depends on the site coverage in the RNAseq dataset, which in turn is determined mainly by the sequencing depth and the expression levels of the transcript of interest. Unfortunately, sufficient coverage for each editing site is often not available in a typical RNA-seq. In order to overcome this limitation, target-enrichment methods coupled with deep sequencing can be applied to provide a high coverage of a predefined set of sequences of interest 67. Zhang et al68 developed a method to screen for up to 960 loci in 48 different samples in a single PCR reaction. This method not only permits accurate measurement of the editing levels of the sites of interest but also enables the identification of novel editing sites surrounding them. The high accuracy of editing levels achieved by such approaches enable the detection of small differences in editing levels across samples.

Concluding Remarks Next generation RNA-sequencing is the most common and established method of quantifying gene expression. Indeed, vast amounts of human RNA-seq data has been deposited in public databases in the past years. The recent advances in sequencing technologies, coupled with the development of the computational pipelines to detect RNA editing discussed here, have already shed light on the role played by editing in health and disease. To date, the vast majority of the editing studies have been conducted in humans or common model organisms. The fascinating findings of extensive recoding editing in Cephalopods transcriptome28 and its effect on their genome evolution opens new research avenues in RNA-editing biology.

In the next few years we expect major advances in detecting RNA-editing from single-cell experiments69. To date, the ability to either detect editing reliably, or to accurately measure editing levels, using single-cell RNA-seq has been limited by a very low typical coverage and possible bias by PCR duplications in such experiments. Understanding RNA editing at the single cell level can unravel its fundamental impact in creating large scale cell diversity.

Acknowledgement The authors thank Orshay Gabay for help with the graphical work and Amos Schaffer for a critical reading of the manuscript. This work was supported by the Israel Science Foundation (1380/14), the MINERVA stiftung ARCHES award and 1-INO-2018-639-A-N grant from the JDRF.

References 1.

Savva, Y. a, Rieder, L. E. & Reenan, R. a. The ADAR protein family. Genome Biol. 13, 252 (2012).

2.

Nishikura, K. Functions and regulation of RNA editing by ADAR deaminases. Annu. Rev. Biochem. 79, 321–49 (2010).

3.

Bass, B. L. RNA editing by adenosine deaminases that act on RNA. Annu. Rev. Biochem.

71, 817–46 (2002). 4.

Mannion, N. M. et al. The RNA-editing enzyme ADAR1 controls innate immune responses to RNA. Cell Rep. 9, 1482–94 (2014).

5.

George, C. X., Ramaswami, G., Li, J. B. & Samuel, C. E. Editing of Cellular Self-RNAs by Adenosine Deaminase ADAR1 Suppresses Innate Immune Stress Responses. J. Biol. Chem. 291, 6158–68 (2016).

6.

Liddicoat, B. J. et al. RNA editing by ADAR1 prevents MDA5 sensing of endogenous dsRNA as nonself. Science (80-. ). 349, 1–9 (2015).

7.

Pestal, K. et al. Isoforms of RNA-Editing Enzyme ADAR1 Independently Control Nucleic Acid Sensor MDA5-Driven Autoimmunity and Multi-organ Development. Immunity 43, 933–944 (2015).

8.

BASILIO, C., WAHBA, A. J., LENGYEL, P., SPEYER, J. F. & OCHOA, S. Synthetic polynucleotides and the amino acid code. V. Proc. Natl. Acad. Sci. U. S. A. 48, 613–6 (1962).

9.

Ramaswami, G. & Li, J. B. Identification of human RNA editing sites: A historical perspective. Methods 107, 42–47 (2016).

10.

Diroma, M. A., Ciaccia, L., Pesole, G. & Picardi, E. Elucidating the editome: bioinformatics approaches for RNA editing detection. Brief. Bioinform. (2017). doi:10.1093/bib/bbx129

11.

Levanon, E. Y. et al. Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22, 1001–1005 (2004).

12.

Athanasiadis, A., Rich, A. & Maas, S. Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol. 2, e391 (2004).

13.

Kim, D. D. Y. et al. Widespread RNA editing of embedded Alu elements in the human transcriptome. Genome Res. 14, 1719–1725 (2004).

14.

Blow, M., Futreal, A. P., Wooster, R. & Stratton, M. R. A survey of RNA editing in human brain. Genome Res. 14, 2379–2387 (2004).

15.

Chen, L., Liu, P., Evans, T. C. & Ettwiller, L. M. DNA damage is a pervasive cause of

sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017). 16.

Ouyang, Z. et al. Accurate identification of RNA editing sites from primitive sequence with deep neural networks. Sci. Rep. 8, 6005 (2018).

17.

Ramaswami, G. et al. Accurate identification of human Alu and non-Alu RNA editing sites. Nat. Methods 9, 579–581 (2012).

18.

Bahn, J. H. et al. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res. 22, 142–150 (2012).

19.

Picardi, E. & Pesole, G. REDItools: High-throughput RNA editing detection made easy. Bioinformatics 29, 1813–1814 (2013).

20.

Park, E., Williams, B., Wold, B. J. & Mortazavi, A. RNA editing in the human ENCODE RNA-seq data. Genome Res. 22, 1626–1633 (2012).

21.

Wang, Z. et al. RES-Scanner: a software package for genome-wide identification of RNAediting sites. Gigascience 5, 37 (2016).

22.

Ramaswami et al. Identifying RNA editing sites using RNA sequencing data alone. Methods Mol. Biol. 10, 128–132 (2013).

23.

Zhu, S., Xiang, J.-F., Chen, T., Chen, L.-L. & Yang, L. Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences. BMC Genomics 14, 206 (2013).

24.

Zhang, Q. & Xiao, X. Genome sequence–independent identification of RNA editing sites. Nat. Methods 12, 347–350 (2015).

25.

’t Hoen, P. A. C. et al. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 31, 1015–22 (2013).

26.

Li, Q. et al. Caste-specific RNA editomes in the leaf-cutting ant Acromyrmex echinatior. Nat. Commun. 5, 4943 (2014).

27.

Alon, S. et al. The majority of transcripts in the squid nervous system are extensively recoded by A-to-I RNA editing. Elife 4, (2015).

28.

Liscovitch-Brauer, N. et al. Trade-off between Transcriptome Plasticity and Genome Evolution in Cephalopods. Cell 169, 191–202.e11 (2017).

29.

Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–512 (2013).

30.

Alon, S., Mor, E. & Vigneault, F. Systematic identification of edited microRNAs in the human brain Systematic identification of edited microRNAs in the human. Genome Res. 22, 1533–40 (2012).

31.

Carmi, S., Borukhov, I. & Levanon, E. Y. Identification of widespread ultra-edited human RNAs. PLoS Genet. 7, e1002317 (2011).

32.

Porath, H. T., Carmi, S. & Levanon, E. Y. A genome-wide map of hyper-edited RNA reveals numerous new sites. Nat. Commun. 5, 4726 (2014).

33.

McKerrow, W. H., Savva, Y. A., Rezaei, A., Reenan, R. A. & Lawrence, C. E. Predicting RNA hyper-editing with a novel tool when unambiguous alignment is impossible. BMC Genomics 18, 522 (2017).

34.

Whipple, J. M. et al. Genome-wide profiling of the C. elegans dsRNAome. RNA 21, 786– 800 (2015).

35.

Mangul, S. et al. ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues. Genome Biol. 19, 36 (2018).

36.

Mele, M. et al. The human transcriptome across tissues and individuals. Science (80-. ). 348, 660–665 (2015).

37.

Porath, H. T., Knisbacher, B. A., Eisenberg, E. & Levanon, E. Y. Massive A-to-I RNA editing is common across the Metazoa and correlates with dsRNA abundance. Genome Biol. 18, 185 (2017).

38.

Tan, M. H. et al. Dynamic landscape and regulation of RNA editing in mammals. Nature 550, 249–254 (2017).

39.

Bazak, L. et al. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes. Genome Res. 24, 365–376 (2014).

40.

Nishikura, K. A-to-I editing of coding and non-coding RNAs by ADARs. Nat. Rev. Mol. Cell Biol. 17, 83–96 (2016).

41.

De Hoon, M. J. L. et al. Cross-mapping and the identification of editing sites in mature microRNAs in high-throughput sequencing libraries. Genome Res. 20, 257–264 (2010).

42.

Ekdahl, Y., Farahani, H. S., Behm, M., Lagergren, J. & Öhman, M. A-to-I editing of microRNAs in the mammalian brain increases during development. Genome Res. 22, 1477– 1487 (2012).

43.

Gong, J. et al. Comprehensive analysis of human small RNA sequencing data provides insights into expression profiles and miRNA editing. RNA Biol. 11, 1375–85 (2014).

44.

Zheng, Y. et al. Accurate detection for a wide range of mutation and editing sites of microRNAs from small RNA high-throughput sequencing profiles. Nucleic Acids Res. 44, e123 (2016).

45.

Ziemann, M., Kaspi, A. & El-Osta, A. Evaluation of microRNA alignment techniques. RNA 22, 1120–38 (2016).

46.

Pinto, Y., Buchumenski, I., Levanon, E. Y. & Eisenberg, E. Human cancer tissues exhibit reduced A-to-I editing of miRNAs coupled with elevated editing of their targets. Nucleic Acids Res. (2017). doi:10.1093/nar/gkx1176

47.

Wang, Y. et al. Systematic characterization of A-to-I RNA editing hotspots in microRNAs across human cancers. Genome Res. 27, 1112–1125 (2017).

48.

Nigita, G. et al. microRNA editing in seed region aligns with cellular changes in hypoxic conditions. Nucleic Acids Res. 44, 6298–308 (2016).

49.

Ivanov, A. et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. Cell Rep. 10, 170–7 (2014).

50.

Picardi, E., D’Erchia, A. M., Gallo, A., Montalvo, A. & Pesole, G. Uncovering RNA Editing Sites in Long Non-Coding RNAs. Front. Bioeng. Biotechnol. 2, 64 (2014).

51.

Goldstein, B. et al. A-to-I RNA editing promotes developmental stage-specific gene and lncRNA expression. Genome Res. 27, 462–470 (2017).

52.

Gong, J. et al. LNCediting: a database for functional effects of RNA editing in lncRNAs. Nucleic Acids Res. 45, D79–D84 (2017).

53.

Samuel, C. E. Adenosine deaminases acting on RNA (ADARs) are both antiviral and proviral. Virology 411, 180–193 (2011).

54.

Tomaselli, S., Galeano, F., Locatelli, F. & Gallo, A. ADARs and the Balance Game between Virus Infection and Innate Immune Cell Response. Curr. Issues Mol. Biol. 17, 37–51 (2015).

55.

Cattenoz, P. B., Taft, R. J., Westhof, E. & Mattick, J. S. Transcriptome-wide identification of A > I RNA editing sites by inosine specific cleavage. RNA (New York, NY) 19, 257–270 (2013).

56.

Sakurai, M. et al. A biochemical landscape of A-to-I RNA editing in the human brain transcriptome. Genome Res. 24, 522–534 (2014).

57.

Bahn, J. H. et al. Genomic analysis of ADAR1 binding and its involvement in multiple RNA processing pathways. Nat. Commun. 6, 6355 (2015).

58.

Paz-Yaacov, N. et al. Elevated RNA Editing Activity Is a Major Contributor to Transcriptomic Diversity in Tumors. Cell Rep. 13, 267–276 (2015).

59.

Khermesh, K. et al. Reduced levels of protein recoding by A-to-I RNA editing in Alzheimer ’ s disease. RNA 22, 1–13 (2016).

60.

Peng, X. et al. A-to-I RNA Editing Contributes to Proteomic Diversity in Cancer. Cancer Cell 33, 817–828.e7 (2018).

61.

Pinto, Y., Cohen, H. Y. & Levanon, E. Y. Mammalian conserved ADAR targets comprise only a small fragment of the human editosome. Genome Biol. 15, R5 (2014).

62.

Jain, M. et al. RNA editing of Filamin A pre-mRNA regulates vascular contraction and diastolic blood pressure. EMBO J. e94813 (2018). doi:10.15252/embj.201694813

63.

Picardi, E., D’Erchia, A. M., Montalvo, A. & Pesole, G. in Current Protocols in Bioinformatics 49, 12.12.1-12.12.15 (John Wiley & Sons, Inc., 2015).

64.

Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–

2079 (2009). 65.

Han, L. et al. The Genomic Landscape and Clinical Relevance of A-to-I RNA Editing in Human Cancers. Cancer Cell 28, 515–528 (2015).

66.

Chen, L. et al. Recoding RNA editing of AZIN1 predisposes to hepatocellular carcinoma. Nat. Med. 19, 209–16 (2013).

67.

Li, J. B. et al. Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324, 1210–1213 (2009).

68.

Zhang, R. et al. Quantifying RNA allelic ratios by microfluidic multiplex PCR and sequencing. Nat. Methods 11, 51–4 (2014).

69.

Picardi, E., Horner, D. S. & Pesole, G. Single-cell transcriptomics reveals specific RNA editing signatures in the human brain. RNA 23, 860–865 (2017).

70.

Kiran, A. M., O’Mahony, J. J., Sanjeev, K. & Baranov, P. V. Darned in 2013: inclusion of model organisms and linking with Wikipedia. Nucleic Acids Res. 41, D258-61 (2013).

71.

Ramaswami, G. & Li, J. B. RADAR: A rigorously annotated database of A-to-I RNA editing. Nucleic Acids Res. 42, D109-13 (2014).

72.

Picardi, E., D’Erchia, A. M., Lo Giudice, C. & Pesole, G. REDIportal: a comprehensive database of A-to-I RNA editing events in humans. Nucleic Acids Res. 45, D750–D757 (2017).

73.

Piechotta, M., Wyler, E., Ohler, U., Landthaler, M. & Dieterich, C. JACUSA: site-specific identification of RNA editing events from replicate sequencing data. BMC Bioinformatics 18, 7 (2017).

74.

John, D., Weirick, T., Dimmeler, S. & Uchida, S. RNAEditor: easy detection of RNA editing events and the introduction of editing islands. Brief. Bioinform. 18, 993–1001 (2017).

75.

Xiong, H. et al. RED-ML: a novel, effective RNA editing detection method based on machine learning. Gigascience 6, 1–8 (2017).

76.

Zhang, F., Lu, Y., Yan, S., Xing, Q. & Tian, W. SPRINT: an SNP-free toolkit for identifying

RNA editing sites. Bioinformatics 33, 3538–3548 (2017). 77.

Kim, M., Hur, B. & Kim, S. RDDpred: a condition-specific RNA-editing prediction model from RNA-seq data. BMC Genomics 17 Suppl 1, 5 (2016).

78.

Lee, S. Y., Joung, J.-G., Park, C. H., Park, J. H. & Kim, J. H. RCARE: RNA Sequence Comparison and Annotation for RNA Editing. BMC Med. Genomics 8 Suppl 2, S8 (2015).

79.

Alon, S., Erew, M. & Eisenberg, E. Genome analysis DREAM : a webserver for the identification of editing sites in ma- ture miRNAs using deep sequencing data. Bioinformatics 31, 1–2 (2015).

-----

Figure 1: Estimating the accuracy of editing detection. The signal-to-noise ratio of editing detection can be estimated by measuring the fraction of the A-to-G mismatches relative to all the 11 other possible mismatches. The figure demonstrates the high signal-to-noise ratio in ALU elements (right panel) relative to non-repetitive sequences (e.g. coding sequences, left panel) and the dramatic change when filtering steps for the initial mismatch are applied.

Figure 2. Detection of A-to-I editing. (A) Using both RNA-seq and matched DNA-seq. Genomic SNPs (light blue, right) can easily be distinguished from RNA-editing events (yellow, left). (B) Using RNA-seq alone. In this case genomic polymorphisms (light blue and blue) can be identified and filtered out by their allelic linkage. (C) In the absence of a reference genome. The transcriptome is assembled by the RNA-seq reads. Both RNA and DNA reads are aligned to the assembled transcriptome and genomic polymorphisms (light blue) are filtered out (D) Illustration of hyper-edited reads- the result of a cluster of edited events (yellow). (E) Detection of editing in ALU elements. Tailored detection methods permit the identification of many editing events

(yellow) despite their relatively low frequency. Reference sequences are colored purple, sequencing reads are colored green and sequencing errors are colored orange.

Table 1. RNA-editing databases. Database URL

Organisms

Description

DARNED70

https://darned.ucc.ie

Human, mouse, drosophila

A database of A-to-I and C-to-U RNA-editing sites in human mouse and fly.

RADAR71

http://rnaedit.com/

Human, mouse, drosophila

A collection of annotated A-to-I editing sites in human, mouse, and fly transcripts.

REDIportal72

http://srv00.recas.ba.infn.it/atlas/index.html

Human

A database collecting more than 4.5 million of A-to-I events in 55 body sites of 150 healthy individuals from GTEx project

Table 2. List of tools to detect RNA-editing from sequencing data. Tool

Description

RNA-DNA RNA-seq comparison alone V V

URL

REDItools19

Python scripts package developed to detect and quantify RNA-editing at genomic scale by next generation sequencing data

GIREMI24

Calculates the mutual information (MI) of the mismatch pairs identified in the RNA-seq reads to distinguish RNA editing sites and SNPs.

X

V

https://www.ibp.ucla.edu/research/xiao/GIREMI.html

RESScanner21

A package for identification and annotation of RNA-editing sites from matiched RNA-Seq and DNA-Seq data

V

X

https://github.com/ZhangLabSZ/RES-Scanner

JACUSA73

JAVA framework to detect RNA-editing by comparing RNA-DNA and/or RNA-RNA sequencing samples. Tool to detect RNA editing events from either raw RNA-seq reads or mapped RNA reads. De novo identification of RNA editing sites with deep neural networks RNA editing detection based on machine learning

V

https://github.com/dieterich-lab/JACUSA

X

Requiers multiple RNA-seq V

X

V

https://github.com/wenjiegroup/DeepRed

X

V

https://github.com/BGIRED/RED-ML

A tool to detect RNA editing events in the absence of A SNP database A tool to predict RNA-editing events using random forest classifier. A tool and webserver for searching, annotating, and visualizing RNA-editing sites A webserver for the identification of editing sites in mature miRNAs using deep sequencing data.

X

V

http://sprint.tianlab.cn/

X

V

http://epigenomics.snu.ac.kr/RDDpred/

V

V

http://www.snubi.org/

X

miRNA-seq

http://www.cs.tau.ac.il/~mirnaed/

RNAEditor74 DeepRed16 RED-ML75 SPRINT76 RDDpred77 Rcare78 DREAM79

http://srv00.recas.ba.infn.it/reditools/

http://rnaeditor.uni-frankfurt.de

Filter Filtering rationale Min. read coverage* Exclude sequencing errors; Increase the reliability of editing frequency estimation Min. quality score* Mainly to reduce the effect of sequencing errors Min. mapping score* Reduce misalignment errors Min. number of reads that support variation* Mainly to reduce the effect of sequencing errors Min. editing frequency To reduce the effect of sequencing errors or somatic mutations Exclude multi-mapped reads* Reduce misalignment errors Exclude duplicated reads* Increase the reliability of editing frequency estimation Exclude genomic SNPs Avoid false identification of polymorphic positions as editing sites Trim bases in reads ends Reduce library preparation biases; Low-quality sequencing in the ends Sites near splice-junctions Splice-junctions are prone to misalignment, can either be filtered-out or re-mapped. Exclude positions multiple variations To filter non-homozygous positions or positions that are prone to several types of errors Clip overlapping read pairs Increase the reliability of editing frequency estimation Remove sites in homopolymers Homopolymers leads to sequencing errors and misalignment Remove sites in simple repeats Reduce misalignment errors Remove sites within high similarity regions Remove false positives resulting from misalignment of reads onto very similar paralogous regions Filter DNA variations** Remove false positives resulting from somatic mutations or SNV *Filtering steps can be used in both RNA- and DNA-seq **Require matched DNA-seq/variant calling output Table 3. Suggested filtering steps for detection and quantification of RNA-editing

Highlights 

Diverse methods have been developed in order to detect RNA editing, based on the experimental set ups.



Tailored methods have been developed to detect editing in regions of interest such as ALU elements, microRNAs, and heavily edited reads



Global editing in a sample can be quantified by various measurements



This manuscript is an overview of the principles underlying different approaches to detect and quantify RNA-editing utilizing high- throughput sequencing data.