Identification of novel alternative transcripts of the human Ribonuclease κ (RNASEK) gene using 3′ RACE and high-throughput sequencing approaches

Identification of novel alternative transcripts of the human Ribonuclease κ (RNASEK) gene using 3′ RACE and high-throughput sequencing approaches

Genomics xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Identification of no...

1MB Sizes 0 Downloads 39 Views

Genomics xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Genomics journal homepage: www.elsevier.com/locate/ygeno

Identification of novel alternative transcripts of the human Ribonuclease κ (RNASEK) gene using 3′ RACE and high-throughput sequencing approaches☆ Panagiotis G. Adamopoulos, Christos K. Kontos, Andreas Scorilas , Diamantis C. Sideris ⁎

Department of Biochemistry and Molecular Biology, National and Kapodistrian University of Athens, Athens, Greece.

ARTICLE INFO

ABSTRACT

Keywords: Alternative splicing Splice variant Nonsense-mediated mRNA decay (NMD) Next-generation sequencing (NGS) Cancer

The human RNASEK gene encodes Ribonuclease κ, an endoribonuclease that belongs to a highly conserved protein family of metazoans. Recent evidence suggests that the mRNA levels of the RNASEK gene possess biomarker attributes in patients with prostate cancer. In the present study, we used 3′ RACE and next-generation sequencing (NGS) to detect and identify novel RNASEK transcripts. Computational analysis of the NGS data revealed new alternative splicing events that support the existence of novel RNASEK alternative transcripts. As a result, eight RNASEK splice variants were discovered and their expression profile was analyzed with the use of nested PCR in a wide panel of human cell lines, originating from several cancerous and/or normal human tissues. Based on in silico analysis, six of the eight novel RNASEK transcripts are predicted to encode new protein isoforms, while the remaining two splice variants could be considered as nonsense-mediated mRNA decay (NMD) candidates.

1. Introduction In eukaryotic cells, the primary transcripts of the protein-coding genes (pre-mRNAs), encode both intronic and exonic sequences. Then, all pre-mRNAs are processed by the spliceosome and as a result, introns are removed from the mature mRNAs. However, due to a molecular mechanism called alternative splicing, the exons forming each mature protein-coding transcript can be alternatively spliced. Consequently, the process of alternative splicing leads to the production of multiple mRNAs from a single gene and for this reason it has emerged as a fundamental mechanism for the augmentation of transcriptome and proteome diversity. Emerging results derived from human genome analysis have indicated that about 92–94% of human genes have alternative splice forms, derived from alternative splicing events [1]. This fact explains perfectly why human proteome is characterized by much diversity [2]. In addition, up to 60% of all human genes produce at least one alternative transcript, and alternative splicing could potentially generate more alternative transcripts from one gene than the total number of genes in an entire genome [3,4]. Although it is found to be related with the majority of human genes,

alternative splicing isn't always a general phenomenon that occurs in the same way. Although, there have been a lot of studies supporting that alternative splicing can lead to the production of tissue-specific transcripts [5], other scientific reports support that the condition of the tissue plays a more important role in the splicing event [6]. In any case, since the mechanism of alternative splicing critically affects the exon/ intron boundaries of all the produced mature mRNAs in each human tissue, it can have a tremendous effect in gene regulation, cell cycle control, proliferation, apoptosis, angiogenesis and finally in invasion and metastasis [7–10]. Since the relationship between alternative splicing and cancer is not yet fully understood, the identification of novel alternatively spliced transcripts that could be used as potential biomarkers for diagnostic and/or prognostic purposes and as targets for therapeutic strategies should be a major future challenge. For this purpose, the newly introduced large-scale sequencing methodologies will undoubtedly be a major tool for the identification of novel alternative splice variants associated with particular diseases. Next-generation sequencing (NGS) has been one of the most powerful research tools of this decade, enabling high throughput analysis of the human genome. One of the most significant applications of NGS,

☆ Sequence data from this article have been deposited to the GenBank Data Library under Accession No. KT277290.1, KT277292.1, KT277293.1, KT277295.1, KT762146.1, KT762147.1, KT762148.1, and KT762149.1 ⁎ Corresponding author at: Department of Biochemistry and Molecular Biology, National and Kapodistrian University of Athens, Panepistimiopolis, 15701 Athens, Greece. E-mail address: [email protected] (A. Scorilas).

https://doi.org/10.1016/j.ygeno.2019.06.010 Received 12 March 2019; Received in revised form 13 May 2019; Accepted 10 June 2019 0888-7543/ © 2019 Published by Elsevier Inc.

Please cite this article as: Panagiotis G. Adamopoulos, et al., Genomics, https://doi.org/10.1016/j.ygeno.2019.06.010

Genomics xxx (xxxx) xxx–xxx

P.G. Adamopoulos, et al.

RNA-sequencing (RNA-seq), enables deep whole transcriptome sequencing, thus offering various novel insights and unique capabilities. Besides the whole transcriptome sequencing, RNA-seq can be used to analyze more specific regions of the transcriptomes (e.g. specific mRNAs) and as a result it is a powerful application for the detection of both coding and noncoding RNAs, alternative spliced transcripts, allele specific expression, as well as single nucleotide polymorphisms (SNPs) [11]. NGS approaches are expected to offer critical improvements in cancer therapeutics by enabling the identification of novel cancer specific splice-variants in tumor cell-lines and samples. Specific splice variants that exist in certain cancer cell-lines could be future targets for the therapy of various human malignancies [12]. In fact, clinical results have demonstrated that antisense therapy, which uses antisense oligonucleotides to down-regulate specific mRNAs that are proven to contribute to carcinogenesis, can be safely and effectively applied in patients [13,14]. In addition, the use of NGS for studying genes that have not been extensively investigated for specific alternative splice variants in cancer would be of great significance, because it could lead to novel alternative transcripts with diagnostic and prognostic attributes. Evidence regarding one of the most famous tumor suppressor genes, BRCA2, unveiled that one alternatively spliced BRCA2 transcript (delta12-BRCA2) was found to be expressed in much higher levels in breast tumor tissues compared to normal ones, suggesting its high association with breast cancer [26]. One human gene that has not been thoroughly investigated for specific alternative splice variants in cancer is RNASEK. The human RNASEK gene encodes Ribonuclease κ (RNASEK), an endoribonuclease that belongs to a highly conserved protein family in metazoans [15,16]. The ability of ribonucleases to degrade RNA makes them key molecules that affect cell growth, apoptosis, angiogenesis and play roles in the development of human cancers [17]. In addition, it has become clear that ribonucleases can be used not only as biomarkers, but also as therapeutic agents [18,19]. The encoded human RNASEK is implicated in the endosomal pathway and closely associates with the vacuolar ATPase (V-ATPase) proton pump. In fact, recent scientific evidence has demonstrated that RNASEK is needed not only for the V-ATPase function, but also for the early events of endocytosis, since its depletion significantly decreases clathrin-mediated and non-clathrin-mediated endocytosis, resulting in the creation of enlarged clathrin-coated pits (CCPs) at the cell surface [20]. Furthermore, a recent study that applied RNAi screening in order to identify previously unknown genes implicated in viral infection, revealed that RNASEK is required for infection of every virus that enters cells through an acid-dependent pathway. Evidence strongly supports that RNASEK is not needed for virus binding to cells but, rather, is required for their uptake. RNASEK was found to be essential for diverse viruses that are dependent on clathrin-mediated endocytosis for entry, but was dispensable for general endocytic uptake. Therefore, RNASEK appears to play a unique role in viral uptake and may be a therapeutically viable target to inhibit major human viral pathogens [21]. However, despite the progress that has been made regarding the deciphering of RNASEK functionality and the role it plays in the cells, its detailed functional roles still remain unclear. Additionally, the implication of RNASEK in the processes of carcinogenesis and metastasis has been supported by multiple studies. Expression levels of RNASEK-01 mRNA revealed a distinct increase (up to 9-fold) after treatment of breast cancer (BT-20) cells and ovarian cancer (SK-OV-3) cells with the antineoplastic agent paclitaxel [22]. Moreover, expressed sequence tag (EST) analysis showed that the RNASEK gene is transcribed in the vast majority of human tissues and developmental stages. In fact, 75% of these sequences are present in cancer tissues and 25% of them are present in normal tissues [15]. These indications strongly suggest that RNASEK could have a potential of being a useful diagnostic and/or prognostic biomarker. Despite the light that has already been shed regarding RNASEK and its connection to cancer, there is still limited information about specific cancer alternative splice variants of RNASEK in cancer cell lines and samples.

The human RNASEK gene is located on chromosome 17 and the existing evidence has confirmed the existence of three alternative transcripts. The main RNASEK transcript (RNASEK-01, NM_001004333.4) consists of three exons and encodes the major isoform of RNASEK, most likely consisting of 137 amino acid residues (aa). Protein expression analysis experiments of RNASEK-01 mRNA in Pichia pastoris revealed that a disulphide bond between cysteine residues 6 and 69 is essential for the ribonucleolytic activity of the human enzyme [23]. A recent study by members of our group revealed a subtle alternative splice variant, which leads to a protein coding mRNA variant (RNASEK-02, NR_037716.1). RNASEK-01 and RNASEK-02 mRNA transcripts are almost identical and differ only in 4 nucleotides, as RNASEK-02 lacks the sequence GTTG, which corresponds to the last four nucleotides of the first exon [24]. In addition to RNASEK-01 and RNASEK-02, alternative splicing events lead to the production of a noncoding mRNA transcript (miscellaneous RNA, NR_037715.1), which possesses an additional exon located between the first two exons of RNASEK-01 and RNASEK-02. In this study, we describe the identification of eight novel RNASEK transcripts with the use of nested 3′ rapid amplification of cDNA ends (nested 3′ RACE) and high-throughput sequencing approaches as well as their expression analysis in a wide panel of human cell lines, originating from several cancerous and/or normal human tissues. 2. Results 2.1. RNASEK gene coverage and novel findings After the NGS run on the Ion PGM™ System, a FASTQ file containing the sequencing reads derived from nested 3′ RACE in a pool of cDNAs from 55 human cell lines was obtained. Then, the obtained FASTQ file was uploaded to the open source, web-based GALAXY platform, where the HISAT2 algorithm was used for the alignment of sequencing reads against the human reference genome (GRCh38), as described in Methods. Bioinformatic analysis of the acquired NGS data confirmed the existence of all the RNASEK annotated splice junctions (Table 1). Additionally, due to the small cDNA length of RNASEK, the majority of the obtained sequencing reads covered entire RNASEK mRNA sequences from the first to the last exon of the transcript. However, computational analysis led to the detection of novel, much less abundant RNASEK transcripts (Table 1), which contain exon extensions and exon truncations in their cDNA sequences (Suppl. Fig. 1). The raw sequencing reads that confirm the existence of these novel transcripts (Suppl. Fig. 2) were perfectly aligned in the reference genome (GRCh38), after the visualization of the result with the Integrative Genomics Viewer (IGV) (Fig. 1). As a result, this study led to the identification of eight novel alternative RNASEK transcripts, which were unveiled without any assembly, as the entire cDNA sequence was covered by single sequencing reads. The nucleotide sequences of these transcripts (namely RNASEK v.4, v.6, v.7, v.9, v.17, v.18, v.19, and v.20) were deposited in GenBank® (GenBank ID: KT277290.1, KT277292.1, KT277293.1, KT277295.1, KT762146.1, KT762147.1, KT762148.1, and KT762149.1, respectively). 2.2. Novel transcript RNASEK v.4 Analysis of the NGS data revealed the existence of a subtle alternative splice variant of RNASEK (RNASEK v.4), which is almost identical with the annotated non-coding mRNA transcript (miscellaneous RNA, NR_037715.1). The difference between the two transcripts is only 4 nucleotides, since RNASEK v.4 lacks the sequence GTTG that corresponds to the last four nucleotides of the first exon. This particular subtle change in the mRNA sequence could have vital effects, since RNASEK v4 mRNA (unlike the annotated miscellaneous RNA transcript) contains an openreading frame (ORF) and therefore is predicted to encode an alternative RNASEK isoform (is.3) that consists of 208 aa (Fig. 2). 2

Genomics xxx (xxxx) xxx–xxx

P.G. Adamopoulos, et al.

Fig. 1. Visualization of the eight aligned sequencing reads corresponding to the respective novel splice variants of RNASEK against the reference genome, using the Integrative Genomics Viewer (IGV). The alignment process was carried out with the HISAT2 algorithm, and the produced BAM and BED files were loaded in the IGV genome browser.

also revealed a rare transcript variant that contains a small extension of 16 nucleotides in the 3′ end of exon 1 (sequence 5′-GTGAGGGGACTC CCCG-3′), located after the GTTG sequence (Suppl. Fig. 1). Based on the obtained sequencing reads, the extended exon 1 is spliced to exon 3, producing a novel RNASEK transcript, RNASEK v.7 (Fig. 2). Besides this extension, RNASEK v.7 is identical with the main RNASEK transcript (RNASEK-01, NM_001004333.4). Finally, RNASEK v.7 has an ORF and as a result it is predicted to encode a new RNASEK isoform (is.5), which consists of 85 aa (Fig. 2).

Table 1 Number of sequencing reads supporting the existence of each unique novel RNASEK transcript in the FASTQ file. Sequencing readsa confirming a unique transcript

RNASEK transcript

Previously annotated transcripts Novel transcripts

a

RNASEK-01 (NM_001004333.4) v.2 (NR_037715.1) RNASEK-02 (NR_037716.1) v.4 (KT277290.1) v.6 (KT277292.1) v.7 (KT277293.1) v.9 (KT277295.1) v.17 (KT762146.1) v.18 (KT762147.1) v.19 (KT762148.1) v.20 (KT762149.1)

41,962 10 11,577 129 14 106 56 50 8 8 5

2.5. Novel transcripts RNASEK v.9 and RNASEK v.18 Additionally, two more novel transcript variants of RNASEK (RNASEK v.9 and RNASEK v.18), containing an extended exon 3, were identified. In both annotated transcripts RNASEK-01 and RNASEK-02, this exon consists of 77 nucleotides and neighbors exon 1 (with GTTG in RNASEK-01 or without GTTG in RNASEK-02) in its 5′ end and the last exon of the mRNA transcript in its 3′ end (Fig. 2). In these two novel transcripts (RNASEK v.9 and RNASEK v.18), however, exon 3 is extended by 22 nucleotides at its 5′ end (sequence: 5′-TCTACCCATTCC CCTTTTCCAG-3′), having a total length of 99 nucleotides (Fig. 2). However, based on the sequencing reads RNASEK v.9 contains this extension along with the GTTG sequence, thus having a significant similarity with RNASEK-01, while RNASEK v.18 contains this extension, but lacks GTTG from its mRNA sequence, being more similar to RNASEK-02. Both these novel transcripts have ORFs and are predicted to encode alternative RNASEK isoforms (is.6 and is.7, respectively).

Total number of reads in the FASTQ file: 225,780.

2.3. Novel transcripts RNASEK v.6 and RNASEK v.17 The bioinformatic analysis of the NGS data revealed the existence of two novel transcripts (RNASEK v.6 and v.17), which are characterized by a truncated exon 2. In the annotated non-coding mRNA transcript (miscellaneous RNA, NR_037715.1), which is the only annotated RNASEK transcript containing this exon, exon 2 consists of 217 nucleotides. However, in these two novel transcripts that were covered by sequencing reads (RNASEK v.6 and RNASEK v.17), exon 2 is truncated by 19 nucleotides in its 3′ end, thus lacking the sequence: 5′-GTGGCT GAGTCCGAATCCA-3′ and having a total length of 198 nucleotides (Fig. 2). The only difference in the nucleotide sequences of the novel transcripts RNASEK v.6 and RNASEK v.17 is located in the sequence GTTG, which is absent from RNASEK v.6, while it exists in RNASEK v.17. Finally, both these novel transcripts contain a premature translation termination codon (PTC) and therefore they represent nonsensemediated mRNA decay (NMD) candidates (Fig. 2).

2.6. Novel transcripts RNASEK v.19 and RNASEK v.20 Finally, two other novel transcripts (RNASEK v.19 and RNASEK v.20), containing two distinct extensions in exon 4 were identified (Fig. 2 and Suppl. Fig. 1). Besides their extended last exon, both transcripts are identical with RNASEK-01, since they contain the GTTG sequence. In detail, the novel transcript RNASEK v.19 is characterized by an extension of 166 nucleotides, which is located in the 5′ end of the last exon (sequence: 5′-TCCACCTCTGCAAAATGGCTATGACAG ATCTCACCCCATAG GATGGTCAAGAAGATTGAATAAAGTAATACACGTAACAGCGCCTAAAC AGGTGTCGGGCACATAGTGCTCAGAAAATGTTGGCTCCTATTAAAGTTT GACCCCTTTGCTTCATTCCCACCCCCAG-3′). The second novel transcript

2.4. Novel transcript RNASEK v.7 In addition to the afore mentioned results, computational analysis 3

Genomics xxx (xxxx) xxx–xxx

P.G. Adamopoulos, et al.

Fig. 2. Detailed structural demonstration of all annotated and novel RNASEK transcripts. Exons are depicted as boxes and introns as lines; gray and white boxes represent coding and non-coding exons, respectively. Numbers inside boxes and above lines indicate the length of each exon or intron in nucleotides. For each transcript that was predicted to encode for protein isoform, the translation start and stop codons are also shown. The variant number, the GenBank® accession number, and the number of each protein isoform (only for transcripts that were predicted to have an open reading frame) are shown next to each splice variant. The target sites of the variant-specific primers that were used for the expression analysis of each novel RNASEK transcript are also shown.

(RNASEK v.20) is very similar with RNASEK v.19, as their only difference is the length of the extension. In particular, the extension length of the last exon in RNASEK v.20 is 115 nucleotides, (sequence: 5′-AAGATTGA ATAAAGTAATACACGTAACAGCGCCTAAACAGGTGTCGGGCACATA GTGCTCAGAAAATGTTGGCTCCTATTAAAGTTTGACCCCTTTGCTTCATT CCCACCCCCAG-3′). Both RNASEK v.19 and RNASEK v.20 have ORFs and as a result they are predicted to encode two novel RNASEK isoforms (is.8 and is.9, accordingly).

2.7. Expression profile of the novel RNASEK transcripts Following the identification of all eight novel RNASEK transcripts, the expression profile of each splice variant in a panel of cDNAs corresponding to distinct cancerous human and normal tissues (breast adenocarcinoma, ductal carcinoma of the breast, ovarian cancer, endometrial adenocarcinoma, cervical carcinoma, prostate cancer, urinary bladder cancer, renal cell carcinoma, colorectal cancer, gastric adenocarcinoma, hepatocellular carcinoma, brain cancer, lung adenocarcinoma, melanoma, lymphoma, leukemia, normal embryonic 4

Genomics xxx (xxxx) xxx–xxx

P.G. Adamopoulos, et al.

validation of their nucleotide sequences (Suppl. Data). However, it should be mentioned that due to the subtle alternative splicing event occurring at the exons 1 and 2 junction site, RNASEK v.2 and v.4 transcripts as well as RNASEK v.6 and v.17 transcripts differ only in the presence/absence of GTTG. As a result, the variant-specific primer pairs that were used for the validation and expression analysis of these novel transcripts, may amplify both transcripts due to potential mispriming. 2.8. Structural analysis of the predicted novel protein isoforms All the presented novel transcripts were tested whether they have ORF or not, thus having the potential to encode protein isoforms of RNASEK or they are actually NMD candidates. The discrimination of these transcripts was based on the fact that NMD is elicited by premature translation termination codons (PTCs) residing 5′ to a limit of approximately 50 nucleotides upstream of the last exon or exon junction, whereas mRNAs with a PTC 3′ to this limit are usually stable [25–27]. As a result, 6 of the novel alternative splice variants of RNASEK are predicted to encode new protein isoforms, whereas the remaining 2 novel alternative splice variants of RNASEK are NMD candidates. In order to generate a predicted 3D structure model for the novel protein isoforms, we used the I-TASSER server, an online tool for 3D structure model construction (Fig. 4) [28,29]. 3. Discussion Fig. 3. Expression analysis of each novel RNASEK splice variant with nested PCR using variant-specific pairs of primers, in a panel of cDNA pools corresponding to distinct cancerous and normal human tissues. cDNAs derived from established cell lines were mixed to produce cDNA pools, each one representing a different human tissue. Each PCR product derives from the amplification of a single novel RNASEK transcript.

In the present study, we present eight novel alternatively spliced variants of the human RNASEK gene that were identified in 55 human cancer cell lines, using nested 3′ RACE and NGS methodology. These newly discovered RNASEK transcripts are 5′-partial, as their sequences start from the target region of the forward gene-specific primer that was used for the nested 3′ RACE and therefore their 5′-untranslated region (5′ UTR) remains unclear. The identification of these novel transcripts was achieved by applying an in-house–developed DNA-seq assay, which allowed the discovery of much less abundant splice variants. Although RNA-seq data from whole transcriptome sequencing experiments offers a tremendous amount of information for analysis, it is characterized by a seriously decreased coverage of rare transcripts that most often result from alternative splicing of pre-mRNAs. Undoubtedly, cancer research has been exponentially developed during the past decade, mainly due to the revolutionary highthroughput sequencing technology and the data it provides. NGS can be a powerful tool that will accelerate genomic discoveries and help researchers translate them into clinically significant information. NGS is expected to expand the scale of the genomic research time and cost

kidney, normal pancreas and head and neck squamous cell carcinoma) was performed with nested PCR using variant-specific primers (Table 2 and Suppl. Table 1) and electrophoretic detection of the acquired PCR products in agarose gels (Fig. 3). All novel RNASEK transcripts were found to be widely expressed in all cell lines examined, suggesting a very important biological role for this gene. On the other hand, the very low abundancy of the novel splice variants as compared to the relatively high representation of the two previously annotated transcripts did not allow the simultaneous detection of all transcripts with a common set of primers. In addition, the nested PCR amplicons corresponding to the eight novel RNASEK transcripts were subjected to Sanger sequencing for the

Table 2 Primers used for the molecular cloning of all novel RNASEK transcripts by 3′ RACE and nested 3′ RACE, and for their expression analysis in human cell lines by PCR. Primers

Molecular cloning

3′ RACE Nested 3′ RACE

Expression analysis of novel RNASEK transcripts

Direction

Name

Sequence (5′ → 3′)

Length (nt)

Tm (°C)a

Forward Reverse (Universal) Forward Reverse (Universal) Forward

F(ATG) R(Outer) F(Nested) R(Inner) 1tr/2F 1ext/3F 1/3extF 1/2F 1tr/3extF 1/3F 2R 2tr/3R 3/4R 3/4ext(a)R 3/4ext(b)R

GATGGGATGGTTGAGGCCG GCGAGCACAGAATTAATACGACT CCCCTGCGAGGGCATCC AGCACAGAATTAATACGACTCACTATAGG GAGTGATCATTAATCCACCCACC ACTCCCCGATAATGCTCGGAA ATGTTGTCTACCCATTCCCCTT CATGTTGTAATCCACCCACCG GATCATTCTACCCATTCCCCTTTTC GGAGTGATCATGTTGATAATGCTCG TGGATTCGGACTCAGCCAC CGAGCATTATCTGGAACAGGGC CTGGGGGCCATTCTCAAAATCT TTGCAGAGGTGGATCAAAATCTT TACTTTATTCAATCTTTCAAAATCTTTCTC

19 23 17 29 23 21 22 21 25 25 19 22 22 23 30

60.5 59.2 62.3 60.7 58.9 61.0 59.1 59.3 59.5 60.1 59.4 61.4 60.6 58.6 56.2

Reverse

a

Melting temperatures (Tm) were calculated by Primer-BLAST. 5

Genomics xxx (xxxx) xxx–xxx

P.G. Adamopoulos, et al.

Fig. 4. Models of the protein isoforms that are predicted to be encoded by the novel RNASEK splice variants, using the I-TASSER web server. For each protein, only the 3D structure with the highest confidence score is demonstrated.

effectively and to have a high impact in the diagnosis, prognosis as well as the treatment of many diseases [30]. Although many challenges remain to be accomplished, NGS has been used for genetic screening, diagnostics, as well as clinical assessment, enabling clinicians to organize a better treatment or to make more accurate diagnostic decisions for patients suffering with cancer [31]. The novel technology of NGS enables researchers to discover and identify novel transcript modifications [30], transcribed somatic mutations [32], as well as novel transcripts created by alternative splicing events [33,34]. In fact, as shown in the present study regarding the RNASEK novel transcripts, even rare alternative splicing events including alternative 5′ or 3′ UTRs, exon swaps, intron retentions, 5′ or 3′ extensions of several exons, alternative promoters as well as alternative polyadenylations, that were impossible to be detected by previous technologies, can now be identified. The ability of NGS to efficiently identify novel cancer-specific splice variants could prove to be of high significance, because many new

transcripts could serve as potential biomarkers, thus contributing to better diagnosis or prognosis of various human malignancies or even targets for therapy in case they have been associated with cancer progression. The continuing identification of novel splice variants in cancerous tissues and cell lines will provide a large and rapidly-expanding array of potential therapeutic targets. Multiple recent reports have been made indicating differential expression levels, profiles and functions of splice variants of many genes under normal and disease conditions. In a recent study by members of our group, aiming to investigate the RNASEK mRNA expression in prostate cancer and benign prostatic hyperplasia samples, RNASEK mRNA levels were found significantly decreased in prostate cancer compared to benign prostatic hyperplasia. Additionally, overexpression of RNASEK mRNA was found associated with decreased hazard of prostate cancer development, being able to distinguish patients with prostate cancer from those with benign prostatic hyperplasia, independently of serum PSA levels, and as a result 6

Genomics xxx (xxxx) xxx–xxx

P.G. Adamopoulos, et al.

RNASEK mRNA expression was found to possess strong biomarker value in prostate cancer [35]. For this reason, the newly discovered RNASEK transcripts that are presented in the current study may represent putative novel biomarkers for prostate cancer or other human malignancies and their quantification in human samples should be a main future challenge. Since RNA stability is highly involved in the pathogenesis of numerous malignancies, such as lung cancer, all factors involved could serve as new potential targets for cancer therapy [36]. Among all these factors, RNA-degrading enzymes, known as ribonucleases (RNases), which are found to be firmly associated with the development of various human malignancies [37], could be the most attractive targets for future experiments regarding cancer therapy. As a result, all novel transcripts of RNASEK gene presented in this study should be further investigated, because they might possess novel and unique characteristics that may shed some light in many unanswered questions, regarding the implication of RNA degradation on carcinogenesis. In particular, the experimental validation of the 5 novel protein isoforms presented in this work is an imperative need, because a potential novel isoform could possess a critical function in the RNA degradation under both normal or cancer conditions. Despite the enormous capabilities of NGS, this technology still faces certain limitations that need to be faced, some of which are present in this study as well. Apart from the long-term experimental procedure [38], the amplification of DNA libraries not only creates biased reads, but also make epigenetic modifications undetectable. In addition, the short reads NGS produces offer valuable, but at the same time limited information, thus requiring accurate and well-designed algorithms to synthesize sequences from raw NGS data [39]. As a result, since our understanding of the genomics is limited [40], without the development of advanced bioinformatics systems, the capabilities of NGS technology will not be fully exploited [41].

(5′-GCGAGCACAGAATTAATACGACTCACTATAGGTTTTTTTTTTTTVN-3′, where V = G, A, C and N = G, A, T, C) as primer. After the cDNA synthesis, the 55 cDNA samples were mixed in a total of 19 cDNA pools, based on the tissue of origin or the type of malignancy of each cell line. 4.3. 3′-rapid amplification of cDNA ends PCR (3′ RACE) For the amplification of the RNASEK transcripts in the 19 cDNA pools, the 3′ RACE methodology was applied, because it enables the use of the natural poly(A) tail of the 3′ end of mRNAs as a priming site for PCR and therefore enables the investigation of 3′ UTRs. For this purpose, a universal reverse primer [R(Outer)] was designed to target the sequence of the oligo-dT-adaptor that was used in the reverse transcription (Table 2). However, in order for 3′ RACE to be specific for RNASEK mRNAs, a forward gene-specific primer [F(ATG)] was used and was designed to anneal to the region of the annotated translation start codon (ATG). Consequently, by the end of 3′ RACE, the amplification of the mRNA template between the defined internal site and the 3′ end of the mRNA was accomplished [43]. The 19 derived 3′ RACE products were appropriately diluted in nuclease-free water and were used as templates for nested 3′ RACE. As a result, another forward gene-specific primer was designed [F(Nested)], targeting the region located a few bases downstream from “F(ATG)”, and was used along with an inner universal reverse primer [R(Outer)] (Table 2). The purpose of nested 3′ RACE was to increase the reaction sensitivity and specificity for RNASEK mRNAs. For trustworthy results as well as to increase the quality of primer designing, all primers were designed using Primer-BLAST. In this way it was able to achieve more accurate and improved results regarding multiple parameters, such as intermolecular complementarity and splice variant handling [44]. Both 3′ RACE and nested 3′ RACE was performed using Veriti® 96Well Fast Thermal Cycler (Applied Biosystems®), in 25 μL reactions containing MgCl2-free KAPA Taq Buffer C (Kapa Biosystems Inc., Woburn, MA, USA), 1.5 mM MgCl2, 0.2 mM dNTPs, 50 pmol of each primer, and 2 units of KAPA Taq DNA Polymerase (Kapa Biosystems Inc.). In addition, the cycling conditions that were applied were the following: a denaturation step at 95 °C for 3 min, followed by 35 cycles of 95 °C for 30 s, 65 °C for 30 s, 72 °C for 1 min, and a final extension step at 72 °C for 2 min. However, due to the appearance of spurious smaller bands in the product spectrum, which are expected in 3′ RACE experiments, Touchdown PCR was carried out in order to increase sensitivity, specificity as well as the yield [45]. Therefore, an auto-delta of −0,3 °C was applied per cycle, starting from the second cycle, which in other words means that in every cycle the annealing temperature was decreased by −0,3 °C, leading to better appearances in the agarose gel bands.

4. Materials and methods 4.1. Biological material This study was performed using an established panel of 55 human cell lines, which included: MCF-7, SK-BR-3, BT-20, MDA-MB-231, MDAMB-468 (breast adenocarcinoma), BT-474, T-47D, ZR-75-1 (ductal carcinoma of the breast), OVCAR-3, SK-OV-3, ES-2, MDAH-2774 (ovarian cancer), Ishikawa, SK-UT-1B (endometrial adenocarcinoma), HeLa, SiHa (cervical carcinoma), PC-3, DU 145, LNCaP (prostate cancer) T24, RT4 (urinary bladder cancer), ACHN, 786-O, Caki-1 (renal cell carcinoma), Caco-2, DLD-1, HT-29, HCT 116, SW 620, COLO 205, RKO (colorectal cancer), AGS (gastric adenocarcinoma), Hep G2, HuH7 (hepatocellular carcinoma), U-87 MG, U-251 MG, D54, H4, SH-SY5Y (brain cancer), A549 (lung adenocarcinoma), FM3, MDA-MB-435S (melanoma), Raji, Daudi, U-937 (lymphoma), K-562, HL-60, Jurkat, REC-1, SU-DHL-1, GRANTA-519 (leukemia), HEK293 (normal embryonic kidney), 1.2B4 (normal pancreas), and BB49-SCCHN, CAL-33 (head and neck squamous cell carcinoma). All the above cell lines were cultured according to the American Type Culture Collection guidelines.

4.4. Purification of the nested 3′ RACE products All nested 3′ RACE products were cleaned-up using the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel GmbH & Co. KG, Duren, Germany) in order to be used for the library construction workflow. After clean-up, all samples were assessed spectrophotometrically at 260 and 280 nm, using the BioSpec-nano Micro-volume UV–Vis Spectrophotometer (Shimadzu Scientific Instruments) and stored at −20 °C until the next step of the experimental procedure.

4.2. Total RNA extraction and reverse transcription Total RNA was extracted from all 55 human cell lines and was stored in RNA Storage Solution (Life Technologies Ltd., Carlsbad, CA, USA) at −80 °C until next use. The purity and concentration assessment for each RNA sample was carried out spectrophotometrically at 260 and 280 nm, using the BioSpec-nano Micro-volume UV–Vis Spectrophotometer (Shimadzu Scientific Instruments) [42]. In the next step, reverse transcription was performed in reaction volumes of 20 μL, using 5 μg of total RNA from each cell line, M-MLV Reverse Transcriptase (Life Technologies Ltd.) as well as an oligo-dT-adaptor

4.5. Library construction and quantification A volume of 10 μL of each purified nested 3′ RACE product was mixed in a final PCR product sample. Then, an initial amount of 100 ng was selected for the library construction workflow, which was carried out using the Ion Xpress™ Plus Fragment Library Kit (Cat. no. 4471269). This kit provides the Ion Shear™ Plus Reagents, which are used for the enzymatic fragmentation of the doubled-stranded DNA, plus the 7

Genomics xxx (xxxx) xxx–xxx

P.G. Adamopoulos, et al.

contents of the Ion Plus Fragment Library Kit for library preparation from the enzymatically fragmented DNA. At the first step of the workflow, the Ion Shear™ Plus Reagents (Ion Torrent™) were used for the enzymatic fragmentation of the PCR product, followed by purification of the fragmented DNA. Then, adapter ligation, nick-repair and purification of the ligated DNA were performed. Finally, the size selection of the unamplified library (400-base-read library) was accomplished using an E-Gel® SizeSelect™ 2% Agarose Gel (Invitrogen™). The quantification of the size-selected library was carried out using the Ion Library TaqMan™ Quantitation Kit (Ion Torrent™) and took place in an ABI 7500 Fast Real-Time PCR System (Applied Biosystems™).

Co. KG), before being subjected to Sanger sequencing for amplicon sequence verification (Suppl. Data). Supplementary data to this article can be found online at https:// doi.org/10.1016/j.ygeno.2019.06.010. References [1] E.T. Wang, R. Sandberg, S. Luo, I. Khrebtukova, L. Zhang, C. Mayr, S.F. Kingsmore, G.P. Schroth, C.B. Burge, Alternative isoform regulation in human tissue transcriptomes, Nature 456 (2008) 470–476, https://doi.org/10.1038/nature07509. [2] D.L. Black, Protein diversity from alternative splicing: a challenge for bioinformatics and post-genome biology, Cell 103 (2000) 367–370. [3] D.L. Black, Mechanisms of alternative pre-messenger RNA splicing, Annu. Rev. Biochem. 72 (2003) 291–336, https://doi.org/10.1146/annurev.biochem.72. 121801.161720. [4] H. Keren, G. Lev-Maor, G. Ast, Alternative splicing and evolution: diversification, exon definition and function, Nat. Rev. Genet. 11 (2010) 345–355, https://doi.org/ 10.1038/nrg2776. [5] Y. Dong, A. Kaushal, M. Brattsand, J. Nicklin, J.A. Clements, Differential splicing of KLK5 and KLK7 in epithelial ovarian cancer produces novel variants with potential as cancer biomarkers, Clin. Cancer Res. 9 (2003) 1710–1720. [6] B.R. Graveley, Alternative splicing: increasing diversity in the proteomic world, Trends Genet. 17 (2001) 100–107. [7] B.M. Brinkman, Splice variants as cancer biomarkers, Clin. Biochem. 37 (2004) 584–594, https://doi.org/10.1016/j.clinbiochem.2004.05.015. [8] J.P. Venables, Aberrant and alternative splicing in cancer, Cancer Res. 64 (2004) 7647–7654, https://doi.org/10.1158/0008-5472.CAN-04-1910. [9] S. Pal, R. Gupta, R.V. Davuluri, Alternative transcription and alternative splicing in cancer, Pharmacol. Ther. 136 (2012) 283–294, https://doi.org/10.1016/j. pharmthera.2012.08.005. [10] J.P. Venables, Unbalanced alternative splicing and its significance in cancer, Bioessays 28 (2006) 378–386, https://doi.org/10.1002/bies.20390. [11] R. Hitzemann, D. Bottomly, P. Darakjian, N. Walter, O. Iancu, R. Searles, B. Wilmot, S. McWeeney, Genes, behavior and next-generation RNA sequencing, Genes Brain Behav. 12 (2013) 1–12, https://doi.org/10.1111/gbb.12007. [12] C.A. Blair, X. Zi, Potential molecular targeting of splice variants for cancer treatment, Indian J. Exp. Biol. 49 (2011) 836–839. [13] N.M. Goemans, M. Tulinius, J.T. van den Akker, B.E. Burm, P.F. Ekhart, N. Heuvelmans, T. Holling, A.A. Janson, G.J. Platenburg, J.A. Sipkens, J.M. Sitsen, A. Aartsma-Rus, G.J. van Ommen, G. Buyse, N. Darin, J.J. Verschuuren, G.V. Campion, S.J. de Kimpe, J.C. van Deutekom, Systemic administration of PRO051 in Duchenne's muscular dystrophy, N. Engl. J. Med. 364 (2011) 1513–1522, https://doi.org/10.1056/NEJMoa1011367. [14] M. Kinali, V. Arechavala-Gomeza, L. Feng, S. Cirak, D. Hunt, C. Adkin, M. Guglieri, E. Ashton, S. Abbs, P. Nihoyannopoulos, M.E. Garralda, M. Rutherford, C. McCulley, L. Popplewell, I.R. Graham, G. Dickson, M.J. Wood, D.J. Wells, S.D. Wilton, R. Kole, V. Straub, K. Bushby, C. Sewry, J.E. Morgan, F. Muntoni, Local restoration of dystrophin expression with the morpholino oligomer AVI-4658 in Duchenne muscular dystrophy: a single-blind, placebo-controlled, dose-escalation, proof-of-concept study, Lancet Neurol. 8 (2009) 918–928, https://doi.org/10.1016/S14744422(09)70211-X. [15] M.A. Economopoulou, E.G. Fragoulis, D.C. Sideris, Molecular cloning and characterization of the human RNase kappa, an ortholog of Cc RNase, Nucleic Acids Res. 35 (2007) 6389–6398, https://doi.org/10.1093/nar/gkm718. [16] T.N. Rampias, E.G. Fragoulis, D.C. Sideris, Genomic structure and expression analysis of the RNase kappa family ortholog gene in the insect Ceratitis capitata, FEBS J 275 (2008) 6217–6227, https://doi.org/10.1111/j.1742-4658.2008.06746.x. [17] V.A. Shlyakhovenko, Ribonucleases in tumor growth, Exp. Oncol. 31 (2009) 127–133. [18] U. Arnold, R. Ulbrich-Hofmann, Natural and engineered ribonucleases as potential cancer therapeutics, Biotechnol. Lett. 28 (2006) 1615–1622, https://doi.org/10. 1007/s10529-006-9145-0. [19] C. De Lorenzo, G. D'Alessio, From immunotoxins to immunoRNases, Curr. Pharm. Biotechnol. 9 (2008) 210–214. [20] J.M. Perreira, A.M. Aker, G. Savidis, C.R. Chin, W.M. McDougall, J.M. Portmann, P. Meraner, M.C. Smith, M. Rahman, R.E. Baker, A. Gauthier, M. Franti, A.L. Brass, RNASEK is a V-ATPase-associated factor required for endocytosis and the replication of rhinovirus, influenza a virus, and dengue virus, Cell Rep. 12 (2015) 850–863, https://doi.org/10.1016/j.celrep.2015.06.076. [21] B.A. Hackett, A. Yasunaga, D. Panda, M.A. Tartell, K.C. Hopkins, S.E. Hensley, S. Cherry, RNASEK is required for internalization of diverse acid-dependent viruses, Proc. Natl. Acad. Sci. U. S. A. 112 (2015) 7797–7802, https://doi.org/10.1073/ pnas.1424098112. [22] A.S. Gkratsou, E.G. Fragoulis, D.C. Sideris, Effect of cytostatic drugs on the mRNA expression levels of ribonuclease kappa in breast and ovarian cancer cells, Anticancer Agents Med. Chem. 14 (2014) 400–408. [23] M.N. Kiritsi, E.G. Fragoulis, D.C. Sideris, Essential cysteine residues for human RNase kappa catalytic activity, FEBS J. 279 (2012) 1318–1326, https://doi.org/10. 1111/j.1742-4658.2012.08526.x. [24] E.D. Karousis, D.C. Sideris, A subtle alternative splicing event gives rise to a widely expressed human RNase k isoform, PLoS One 9 (2014) e96557, , https://doi.org/10. 1371/journal.pone.0096557. [25] R. Thermann, G. Neu-Yilik, A. Deters, U. Frede, K. Wehr, C. Hagemeier, M.W. Hentze, A.E. Kulozik, Binary specification of nonsense codons by splicing and

4.6. Next-generation sequencing (NGS) The next step was the template preparation and the enrichment process, which was performed with the Ion PGM™ Template OT2 400 Kit (Cat. no. 4479878). The template preparation was carried out using the Ion OneTouch™ 2 instrument, which provides an automated solution for scalable and reproducible template preparation. After the end of the run, Ion Sphere™ Quality Control Kit (Cat. no. 4468656) and Qubit® 2.0 Fluorometer were used for the quality assessment of the templated beads. Since the quality of the samples was considered ideal, the enrichment process took place using the Ion OneTouch™ ES instrument. Finally, NGS based on the semi-conductor sequencing technology was carried out in Ion Torrent Personal Genome Machine™, using the Ion PGM™ 400 Sequencing Kit (Cat. no. 4482002). 4.7. Bioinformatic analysis of the NGS data The Torrent Suite™ software and Torrent server was used for planning, monitoring and viewing the sequencing data. The Torrent Server also makes the alignment of the raw data possible with the Torrent Mapping Alignment Program (TMAP). TMAP provides a fast and accurate aligner what uses popular algorithms and performs alignment in the primary analysis pipeline. Although alignment of the raw data is available through Torrent server, our data analysis was also accomplished using the open-access GALAXY online suite of software tools for NGS data (https://usegalaxy.org) [46]. The FASTQ file that was acquired by the Ion Torrent PGM™ was used as input for GALAXY and HISAT2, a popular aligner for RNA-seq experiments producing sensitive and accurate alignments, was used to align the obtained sequencing reads and to identify novel splice sites with direct mapping to known transcripts [47]. By the end of the alignment process, the BAM file that was produced by HISAT2 contained a list of the reads aligned to the reference genome. Then, the BED file, containing the annotated and novel splice junctions, was generated. Finally, visualization of both BAM and BED files was accomplished using IGV, a high-performance viewer, which enables the interactive visual exploration of the results contained in BAM and BED files [48]. 4.8. Expression analysis of the novel RNASEK transcripts In order to investigate the expression profile of each novel RNASEK splice variant, nested PCR reactions were carried out, using the 19 diluted 3′ RACE products as templates and variant-specific pair of primers (Table 2 and Suppl. Table 1). Consequently, the expression of each novel RNASEK transcript was tested in breast adenocarcinoma, ductal carcinoma of the breast, ovarian cancer, endometrial adenocarcinoma, cervical carcinoma, prostate cancer, urinary bladder cancer, renal cell carcinoma, colorectal cancer, gastric adenocarcinoma, hepatocellular carcinoma, brain cancer, lung adenocarcinoma, melanoma, lymphoma, leukemia, normal embryonic kidney, normal pancreas and head and neck squamous cell carcinoma. The expression profile of each novel transcript was estimated with electrophoresis of the obtained PCR products in agarose gels. Finally, bands were appropriately excised and purified using a Gel and PCR Clean-up kit (Macherey-Nagel GmbH & 8

Genomics xxx (xxxx) xxx–xxx

P.G. Adamopoulos, et al.

[26]

[27] [28] [29] [30] [31] [32]

[33]

[34]

[35]

[36]

cytoplasmic translation, EMBO J. 17 (1998) 3484–3494, https://doi.org/10.1093/ emboj/17.12.3484. J. Zhang, X. Sun, Y. Qian, J.P. LaDuca, L.E. Maquat, At least one intron is required for the nonsense-mediated decay of triosephosphate isomerase mRNA: a possible link between nuclear splicing and cytoplasmic translation, Mol. Cell. Biol. 18 (1998) 5272–5283. J. Zhang, X. Sun, Y. Qian, L.E. Maquat, Intron function in the nonsense-mediated decay of beta-globin mRNA: indications that pre-mRNA splicing in the nucleus can influence mRNA translation in the cytoplasm, Rna 4 (1998) 801–815. A. Roy, A. Kucukural, Y. Zhang, I-TASSER: a unified platform for automated protein structure and function prediction, Nat. Protoc. 5 (2010) 725–738, https://doi.org/ 10.1038/nprot.2010.5. Y. Zhang, I-TASSER server for protein 3D structure prediction, BMC Bioinformatics 9 (2008) 40, , https://doi.org/10.1186/1471-2105-9-40. V.G. LeBlanc, M.A. Marra, Next-generation sequencing approaches in cancer: where have they brought us and where will they take us? Cancers 7 (2015) 1925–1958, https://doi.org/10.3390/cancers7030869. T. Shen, S.H. Pajaro-Van de Stadt, N.C. Yeat, J.C. Lin, Clinical applications of next generation sequencing in cancer: from panels, to exomes, to genomes, Front. Genet. 6 (2015) 215, , https://doi.org/10.3389/fgene.2015.00215. R. Goya, M.G. Sun, R.D. Morin, G. Leung, G. Ha, K.C. Wiegand, J. Senz, A. Crisan, M.A. Marra, M. Hirst, D. Huntsman, K.P. Murphy, S. Aparicio, S.P. Shah, SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors, Bioinformatics 26 (2010) 730–736, https://doi.org/10.1093/bioinformatics/ btq040. R. Morin, M. Bainbridge, A. Fejes, M. Hirst, M. Krzywinski, T. Pugh, H. McDonald, R. Varhol, S. Jones, M. Marra, Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing, BioTechniques 45 (2008) 81–94, https://doi.org/10.2144/000112900. M. Sultan, M.H. Schulz, H. Richard, A. Magen, A. Klingenhoff, M. Scherf, M. Seifert, T. Borodina, A. Soldatov, D. Parkhomchuk, D. Schmidt, S. O'Keeffe, S. Haas, M. Vingron, H. Lehrach, M.L. Yaspo, A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome, Science 321 (2008) 956–960, https://doi.org/10.1126/science.1160342. A. Kladi-Skandali, K. Mavridis, A. Scorilas, D.C. Sideris, Expressional profiling and clinical relevance of RNase kappa in prostate cancer: a novel indicator of favorable progression-free survival, J. Cancer Res. Clin. Oncol. 144 (2018) 2049–2057, https://doi.org/10.1007/s00432-018-2719-0. I. Valles, M.J. Pajares, V. Segura, E. Guruceaga, J. Gomez-Roman, D. Blanco,

[37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48]

9

A. Tamura, L.M. Montuenga, R. Pio, Identification of novel deregulated RNA metabolism-related genes in non-small cell lung cancer, PLoS One 7 (2012) e42086, , https://doi.org/10.1371/journal.pone.0042086. W.C. Kim, C.H. Lee, The role of mammalian ribonucleases (RNases) in cancer, Biochim. Biophys. Acta 1796 (2009) 99–113, https://doi.org/10.1016/j.bbcan. 2009.05.002. L. Liu, Y. Li, S. Li, N. Hu, Y. He, R. Pong, D. Lin, L. Lu, M. Law, Comparison of nextgeneration sequencing systems, J. Biomed. Biotechnol. (2012) (2012) 251364, , https://doi.org/10.1155/2012/251364. E.R. Mardis, Next-generation sequencing platforms, Annu Rev Anal Chem (Palo Alto, Calif) 6 (2013) 287–303, https://doi.org/10.1146/annurev-anchem-062012092628. J.Y. Park, L.J. Kricka, P. Clark, E. Londin, P. Fortina, Clinical genomics: when whole genome sequencing is like a whole-body CT scan, Clin. Chem. 60 (2014) 1390–1392, https://doi.org/10.1373/clinchem.2014.230276. I. Kouskoumvekaki, N. Shublaq, S. Brunak, Facilitating the use of large-scale biological data and tools in the era of translational bioinformatics, Brief. Bioinform. 15 (2014) 942–952, https://doi.org/10.1093/bib/bbt055. S. Sukumaran, Concentration determination of nucleic acids and proteins using the micro-volume BioSpec-nano-spectrophotometer, J. Vis. Exp. (2011), https://doi. org/10.3791/2699. M.A. Frohman, M.K. Dush, G.R. Martin, Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer, Proc. Natl. Acad. Sci. U. S. A. 85 (1988) 8998–9002. J. Ye, G. Coulouris, I. Zaretskaya, I. Cutcutache, S. Rozen, T.L. Madden, PrimerBLAST: a tool to design target-specific primers for polymerase chain reaction, BMC Bioinformatics 13 (2012) 134, , https://doi.org/10.1186/1471-2105-13-134. D.J. Korbie, J.S. Mattick, Touchdown PCR for increased specificity and sensitivity in PCR amplification, Nat. Protoc. 3 (2008) 1452–1456, https://doi.org/10.1038/ nprot.2008.133. D. Blankenberg, J. Hillman-Jackson, Analysis of next-generation sequencing data using Galaxy, Methods Mol. Biol. 1150 (2014) 21–43, https://doi.org/10.1007/ 978-1-4939-0512-6_2. D. Kim, B. Langmead, S.L. Salzberg, HISAT: a fast spliced aligner with low memory requirements, Nat. Methods 12 (2015) 357–360, https://doi.org/10.1038/nmeth. 3317. H. Thorvaldsdottir, J.T. Robinson, J.P. Mesirov, Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration, Brief. Bioinform. 14 (2013) 178–192, https://doi.org/10.1093/bib/bbs017.