Genome-wide identification and characterization of lncRNAs and miRNAs in cluster bean (Cyamopsis tetragonoloba)

Genome-wide identification and characterization of lncRNAs and miRNAs in cluster bean (Cyamopsis tetragonoloba)

Gene 667 (2018) 112–121 Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene Research paper Genome-wide i...

2MB Sizes 0 Downloads 67 Views

Gene 667 (2018) 112–121

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Research paper

Genome-wide identification and characterization of lncRNAs and miRNAs in cluster bean (Cyamopsis tetragonoloba) Sarika Sahua,b,1, Atmakuri Ramakrishna Raoa, Sabari Ghoshalb, Trilochan Mohapatrac

⁎,1

T

, Jaya Pandeya, Kishor Gaikwadd,

a

ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India Amity University, Noida, Uttar Pradesh, India c Indian Council of Agricultural Research (ICAR), New Delhi, India d ICAR-National Research Center on Plant Biotechnology, New Delhi, India b

A R T I C LE I N FO

A B S T R A C T

Keywords: Long non coding RNAs Endogenous target mimic Cluster bean Micro RNA Principal component analysis

Long non coding RNAs (lncRNAs) are a class of non-protein coding RNAs that play a crucial role in most of the biological activities like nodule metabolism, flowering time and male sterility. Quite often, the function of lncRNAs is species-specific in nature. Thus an attempt has been made in cluster bean (Cyamopsis tetragonoloba) for the first time to computationally identify lncRNAs based on a proposed index and study their targeted genes. Further, these targeted genes of lncRNAs were identified and characterized for their role in various biological processes like stress mechanisms, DNA damage repair, cell wall synthesis. Besides, lncRNAs and miRNAs bearing Simple Sequence Repeats (SSRs) were identified that contribute towards biogenesis of small non-coding RNAs. Moreover, five novel endogenous Target Mimic lncRNAs (eTMs) were identified that may disrupt the miRNAmRNA regulations. For easy understanding and usability, a database CbLncRNAdb has been developed and made available at http://cabgrid.res.in/cblncrnadb.

1. Introduction With the emergence of high throughput sequencing technology, RNAome analysis has become relatively easier to study. It includes the study of non-coding RNAs (ncRNAs) (Wang et al., 2015). The ncRNAs are a group of RNAs with no coding potential (Mercer et al., 2009) and recent studies have shown that they are functionally and spatio-temporally expressed in tissues (Greilhuber et al., 2005; Lakhotia, 2016). Among the ncRNAs, lncRNAs and miRNAs have been considered to play an essential role in multiple biological processes (Derrien et al., 2012). The functional mechanism of lncRNAs is diverse and highly significant as only expression of lncRNAs is sufficient enough to regulate the nearby or distant genes through post transcriptional chromatin complexes. (Mercer et al., 2009; Ponting et al., 2009; Wilusz et al., 2009). Ponting et al. (2009) divided lncRNAs into sense, anti-sense, bi-

directional, intronic and inter-genic on the basis of their cellular localization. Moreover, the LncRNAs are generally expressed at low levels and lack sequence similarities among species (Marques and Ponting, 2014). A vast number of lncRNAs have been reported in animals, whereas very few are explored in plants till now (Liu et al., 2012). State-of-art technology like next-generation sequencing eases the analysis of thousands of lncRNAs in model plant organism like Arabidopsis thaliana (Zhang et al., 2013; Zhu et al., 2013; Xie et al., 2014). Heo and Sung (2011) reported the regulation of flowering time due to the association of lncRNAs like cool-assisted intronic non-coding RNA (COOLAIR) and cold-assisted intronic non-coding RNA (COLDAIR) with epigenetic repression of Flowering Locus C (FLC) in Arabidopsis. Another lncRNA: long-day-specific male-fertility-associated RNA (LDMAR) is also found to be involved in photoperiod regulated male sterility in rice (Ding

Abbreviations: lncRNAs, long non coding RNAs; SSR, Simple Sequence Repeat; eTM, endogenous Target Mimic; ncRNAs, non-coding RNAs; COLDAIR, cool-assisted intronic non-coding RNA; COOLAIR, cold-assisted intronic non-coding RNA; FLC, Flowering Locus C; LDMAR, long-day-specific male-fertility-associated RNA; miRNAs, micro RNAs; AtIPS1, induced by phosphate starvation1; ORF, Open Reading Frame; CPC, Coding Potential Calculator; PLEK, predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer; FPKM, fragments per kilobase of transcript per million mapped reads; HPlncRNAs, highly probable lncRNAs; PCs, principal components; PCS, principal component scores; MFE, minimum fold energy; TE, transposable element; IIS, Internet Information Services; ODBC, open database connectivity; LPlncRNAs, low probable lncRNAs; XyG XT, xyloglucan 6-xylosyltransferase; GMGT, galactomannan galactosyltransferase; pre-miRNA, precursor miRNA; TNRs, trinucleotide repeats; ndG, normalized binding free energy ⁎ Corresponding author. E-mail address: [email protected] (A.R. Rao). 1 Equally contributed to this work (Joint First Authors). https://doi.org/10.1016/j.gene.2018.05.027 Received 18 January 2018; Received in revised form 24 April 2018; Accepted 8 May 2018 0378-1119/ © 2018 Elsevier B.V. All rights reserved.

Gene 667 (2018) 112–121

S. Sahu et al.

Fig. 1. An integrative computational pipeline for the systematic identification of noncoding RNAs (lncRNA and miRNA) and their targets.

identification of lncRNAs, miRNAs and eTMs in cluster bean (Cyamopsis tetragonoloba) has not yet been fully characterized. Hence, there is a need to explore the role of non-coding RNAs like lncRNAs, miRNAs and eTMs in various biological processes of cluster bean.

et al., 2012) and ripening in tomato (Zhu et al., 2015). In contrast to lncRNAs, micro RNAs (miRNAs) are very short ncRNAs usually made from long self-complimentary precursor sequences (Rodriguez et al., 2004). miRNAs play key role in various post transcriptional processes such as biotic/abiotic stress responses, tissue differentiation, growth and development (Kidner and Martienssen, 2005). Besides, they are highly conserved among distantly related plant species from non-vascular bryophytes to monocots (Zhang et al., 2006). So far, the miRNAs of many leguminous plants like Acacia auriculiformis, Arachis hypogaea, Lotus japonicas, Acacia mangium, Glycine max, Medicago truncatula, Vigna unguiculata, Phaseolus vulgaris, and Glycine soja are available in miRBase (Griffiths-Jones et al., 2006), whereas miRNAs of cluster bean are yet to be fully explored. The complimentary pairing between miRNAs and lncRNAs leads to the development of endogenous target mimics (eTMs) in the cells (German et al., 2008; Cong et al., 2013). The lncRNAs take part in complex biological phenomena, including gene transcription and translation, protein localization, cellular structure integrity and heat shock response (Fan et al., 2015). Some of the mechanisms like modulation of pri-mRNA splicing, RNA editing and abrogation of miRNAinduced repression might involve binding between long non-coding RNA and other RNA molecules. Wu et al. (2013) predicted that eTMs of several miRNAs have potential to abolish the binding between miRNAs and their targets. They found that eTM might have inhibited miRNA function in spatio-temporal manner in plant development of rice and Arabidopsis. Franco-Zorrilla et al. (2007) reported that AtIPS1 (induced by phosphate starvation1) eTM acts as target to miRNA ath-mir399 with one bulge of 3 nt and perturbs the cleavage effect of miRNA in Arabidopsis and thus regulates the uptake of phosphorous in plant cells. The cluster bean is very popular since 18th century in Indian textile industry, in which Galactomannan (polysaccharide) is one of the key ingredients that has also been used in other industries like paper, petroleum, mining, pharmaceuticals (Hymowitz and Upadhya, 1963). Its gum has great medicinal value and used widely to control multiple diseases like diabetes, high cholesterol, diarrhea and irritable bowel syndrome (Mudgil et al., 2014). In spite of the economic significance of cluster bean, it is less explored and uncharacterized at genomic level especially the non-coding part of the genome. Moreover, the

2. Material and methods 2.1 Data processing Initially, the leaf transcriptome of cluster bean variety M-83 was downloaded from ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ ByRun/sra/SRR/SRR321/SRR3218523/ (Tanwar et al., 2017). Subsequently, FastQC program was run to check the quality of transcriptome data (Andrews, 2010). Low quality sequences were then removed using Trimmomatic (Bolger et al., 2014). A minimum quality score of 35 and a minimum length of 25 nt were set to improve the quality and reliability of reads. As the reference genome of cluster bean is not available, we assembled the raw reads using Trinity (Haas et al., 2013b) (Fig. 1) with default parameter, k-mer equal to 25. The contribution of each transcript was identified by mapping the reads against the assembled data via Bowtie2 aligner (Langmead and Salzberg, 2012) that is available in Trinity. 2.1. Candidate lncRNA prediction The de novo assembled transcriptome data obtained from Trinity was used for the identification of lncRNAs (Mu et al., 2016) by following the pipeline shown in Fig. 1. High stringency measures like (i) length > 200 nt (ii) Open Reading Frame (ORF) length < 100 nt (iii) Coding Potential Calculator (CPC) score < −1 (iv) predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme (PLEK) score < −1 were deployed to eliminate the protein coding transcripts. These four steps have been carried out in the following way: In house perl scripts were written to obtain the sequence length > 200 nt from the assembled data. Selected sequences were then submitted to CPC program (Kong et al., 2007) to differentiate putative coding transcripts from non-coding transcripts. The coding potential of 113

Gene 667 (2018) 112–121

S. Sahu et al.

each transcript was calculated and filtered at CPC index less than −1 for consideration of non-coding transcripts. Subsequently, a powerful alignment-free computational tool PLEK was used to distinguish long noncoding sequences from the rest as reference genome is not available (Li et al., 2014a). Since most of the protein coding genes having ORF length > 100 amino acids, Transdecoder tool (Haas et al., 2013a) was used to identify the sequences with ORF length < 100 nt, which can be considered as probable lncRNAs (Haas et al. 2010). Further to substantiate the filtered sequences, HAMMER (Eddy, 2001) was used. The filtered transcripts were then subjected to blast against tRNA, rRNA and SnoRNA database to remove these housekeeping RNAs. Besides, Augstus tool (Stanke et al., 2004) was used to predict the number of exons on putative lncRNAs.

These parameters are: No bulges allowed other than at 5′ end 9th to 12th position on miRNA sequences, the bulge in eTMs should have only three nucleotides, perfect nucleotide pairing essential at the 5′ end 2nd to 8th position, and the total mismatches and G/U pairs within eTM and miRNA pairing regions should be less than three except for the central bulge. The plant miRNA target prediction software, psRobot, was used to predict the putative eTMs. The psRobot software was run with moderate parameters: penalty score threshold = 2.5, five prime boundary of essential sequence = 2, three prime boundary of essential sequence = 17, maximal number of permitted gaps = 1, and position after which with gaps permitted = 17 (Wu et al., 2012).

2.2. Expression values of lncRNAs The metric - fragments per kilobase of transcript per million mapped reads (FPKM) was used to estimate the expression level of assembled transcripts as it is appropriate for paired-end reads. In other word, FPKM considers the concomitant mapping of the paired-read ends of the cDNA fragment rather than mapping individual reads (Young et al., 2008). The RSEM package of R software, inbuilt in Trinity, was used to estimate the FPKM values of transcripts, including the putative lncRNAs. The FPKM values less than zero were discarded and rest was taken for further analysis.

Simple sequence repeats (SSRs) are microsatellite markers, which contain valuable information on the genetic diversity of the plants (Misganaw and Abera, 2017). They are co-dominant, hyper variable and evenly distributed throughout the coding and non-coding regions (Oliveira et al., 2006). The most commonly used Misa tool (http://pgrc. ipk-gatersleben.de/misa/) was used to find the distribution and frequency of SSRs (mono, di, tri, tetra, penta and hexa) in the assembled transcriptome data. The default parameters of Misa tool with respect to frequency of repeats were considered for prediction SSRs, viz., 10, 6, 4, 3, 3 and 3 for mono, di, tri, tetra, penta and hexa nucleotide repeats respectively.

2.3. Proposed method for identification of highly probable lncRNAs

2.7. Analysis of repetitive elements in lncRNAs and miRNAs

A principal component analysis based index score is proposed to identify the highly probable lncRNAs (HPlncRNAs). The six parameters viz., FPKM values, ORF length, number of exons, CPC score, PLEK score and length of lncRNAs were considered as variables to compute correlation matrix, which is used to derive the principal components (PCs) (eigen vectors) and their respective eigen values. The principal component scores (PCS) are further computed based on PCs. A weighted average of principal component scores is computed based on the following formula:

The repetitive elements in the eukaryotic genomes reveal the biogenesis of few small non coding RNAs (Joy and Soniya, 2012) and lncRNAs. Besides, the Transposable Elements (TE) play very important role in the origin of heterochromatic small RNAs (hcRNAs), miRNA, lncRNA and piRNA (Farazi et al., 2008). The origin, evolution and function of lncRNAs in cell also depend on the insertion of the transposable element (TE) (Kelley and Rinn, 2012); (Kapusta and Feschotte, 2014). The frequency and distribution of TEs and repetitive elements in non-coding RNAs sequences was analyzed by RepeatMasker tool (http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker).

IlncRNAi =

2.6. Identification of lncRNAs and miRNAs bearing SSRs

∑ λ iPCSi/ ∑ λ i; i = 1, 2, …6. 2.8. Evolutionarily conserved lncRNAs

where λi is the ith eigen value and PCSi is the ith principal component score. The index values-IlncRNAs for the putative lncRNAs are computed and then ranked with the highest index value being considered as rank one. The highly scored lncRNAs were then considered for identification of highly probable lncRNAs (HPlncRNAs).

For identification of conserved lncRNAs in cluster bean, initially the known lncRNAs of the plants species, viz., Glycine max, Arabidopsis thaliana, Zea mays, Oryza sativa, Amborella trichopoda were downloaded from CANTATAdb – a Collection of Plant Long Non-coding RNAs (http://www.lncrnablog.com/cantatadb-a-collection-of-plant-longnon-coding-rnas/). Further to study their conservedness, a mapping was done (using blastn programme) against all the collected lncRNAs of CANTATAdb.

2.4. Prediction of miRNAs The total assembled transcripts obtained from Trinity were mapped against plant miRNAs available under miRBase database (release 21) (ftp://mirbase.org/pub/mirbase/CURRENT/) using blastn program (Altschul et al., 1990) with two significant parameters e-value < 0.001 and word-size > 16. As primarily processed miRNA precursors (primiRNAs) in plant are heterogeneous and their size ranges between 55 and 900 nt with an average of 145 nt (Jones-Rhoades et al., 2006) a perl script was developed to extract possible pri-miRNA sequences. All candidate pre-miRNAs were submitted to RNAfold tool (Denman, 1993) to obtain the structure and minimum fold energy (MFE) of the predicted miRNAs. Further, candidate structures were filtered on the basis of MFE (less than −40 kcal/mol) and used to predict the mature miRNA from pri-miRNA sequences (Wang et al., 2016).

2.9. Cluster bean lncRNA database (CbLncRNAdb) A relational database CbLncRNAdb has been designed and populated with the predicted lncRNAs of cluster bean. Access to CbLncRNAdb is provided through an online portal with web link (http://cabgrid.res.in/cblncrnadb). The database was built in MySQL at backend and server side scripting is done using ASP with C# under .NET frame work. Internet Information Services (IIS) is used as web server for providing online access. The front end is designed in HTML 5.0, CSS and JQuery. MySQL database is connected to ASP·NET using open database connectivity (ODBC) driver. Help manual is built in HTML for users. Search facility for lncRNAs and HPlncRNAs is given in the database. Statistics menu is also provided to get summary information of the database. CbLncRNAdb also provides information on lncRNAs, SSRs, miRNAs, lncRNAs bearing SSRs, miRNA bearing SSRs and eTMs.

2.5. Identification of lncRNAs as target mimics The genome wide predicted lncRNAs and miRNAs were considered to identify eTMs using the parameters enlisted in Wu et al. (2013). 114

Gene 667 (2018) 112–121

S. Sahu et al.

3. Results and discussion 3.1 Transcripts assembly The RNA-seq data from leaf tissue of cluster bean (SRR3218523) containing 2,868,803 reads (≈5 Gb) was used for the lncRNA identification. On the basis of Q-score (phred score > 33), the quality of the reads was checked. The trimmed data, after eliminating adapters by Trimmomatic, was subjected to further de novo assembly by Trinity. The de novo assembly of the RNA-seq data has generated 53,579 transcriptome contigs with an average contig length of 577 bp and N50 value of 754 bp. 3.1. Identification of cluster bean lncRNAs A total of 19,011 (20%) out of 53,579 transcripts were included in the downstream analysis after filtering the transcripts with CPC score < −1 and FPKM > 1 (Wang et al., 2015). This criterion was followed to consider the strong noncoding transcripts. At the same time, the PLEK was applied to include the non-coding transcripts with PLEK score less than −1. In further analysis, the intersection of both CPC and PLEK (CPC ∩ PLEK) based filtered transcripts led to 17,003 transcripts. It is possible that few lncRNAs might contain partial or complete ORF by chance. Thus, the ORF length of the lncRNAs has been kept below 100 as given by (Boerner and McGinnis, 2012). Hence, TransDecoder package (http://transdecoder.sourceforge.net/) was used to include the transcripts having ORF length < 100. Here, the TransDecoder was used to identify candidate coding regions within transcript sequences, generated by de novo RNA-seq transcriptome assembly. In order to eliminate the coding transcripts further, HMMER was run against the pfam database with e-value (less than or equal to 1e−10). The remaining transcripts were searched against miRBase, tRNA and rRNA databases to filter out housekeeping genes using blastn with e-value (< 1e−10) and identity > 90%. Finally, a total of 11,516 putative lncRNAs were filtered and designated as CT_lnc_00001 to CT_lnc_11,516. The average length of putative lncRNAs of cluster bean was found to be 287 bp. The detailed characteristics like, length wise distribution (Fig. 2a) and exonic distribution (Fig. 2b) of lncRNAs were also worked out. It was reported earlier that the average length of lncRNAs varies from species to species i.e., 285 bp, 364 bp, 463 bp and 323 bp for Arabidopsis, maize, kiwifruit and rice respectively (Liu et al., 2012; Zhang et al., 2014; Li et al., 2014b; Tang et al., 2016). Besides, we observed 1564 (12%) of 11,516 lncRNAs to have exons from the analysis performed by Augustus tool. Among these, mono-exonic lncRNAs were 91% (Fig. 2b) as compared to 81% in kiwifruit and 81% in maize (Li et al., 2014b; Tang et al., 2016).

Fig. 2. The characteristic feature of cluster bean lncRNA. a) Length wise distribution of lncRNA. b) Distribution of exon on lncRNAs.

Fig. 3. Distribution of lncRNAs over a range of expression values.

basis of expression value and length we have made a comparative study between HPlncRNA vs LPlncRNA as shown in Fig. 4. As the expression value changes from range 1–10 category to range 10–20 as well to > 40, the percentage of HPlncRNA increases as shown in Fig. 4a. Similarly, results were found in length wise distribution also. HPlncRNA vs LPlncRNA are shown in Fig. 4b. The top 50 HPlncRNA are given along with their expression value and length in Supplementary Table 1. 19 HPlncRNAs out of 1101 are showing significant interaction with targeted genes of cluster bean. These targets are involved in resistance against microbial pathogens and plant tolerance to abiotic stress. The target gene of one of the lncRNAs is xyloglucan 6-xylosyltransferase (XyG XT) 2-like (Gene ID: 11420236). This gene is related to galactomannan galactosyltransferase (GMGT) (Edwards et al., 1999; Faik et al., 2002), which in turn helps in the synthesis of galactomannan (Bhullar and Bhullar, 2012).

3.2. The expression of lncRNAs The average expression value (FPKM) of the predicted lncRNAs was found to be 19.87. On the basis of expression values, 6 clusters of lncRNAs were made. Most of lncRNAs (75.22%) lies between FPKM value 1–10 while the lowest number (0.05%) of lncRNAs fall below FPKM value 1 (Fig. 3). The expression level of cluster bean lncRNAs is showing similarity with other plant lncRNAs like chick pea, Arabidopsis, cucumber, rice and maize (Li et al., 2014b; Zhang et al., 2014; Hao et al., 2015; Khemka et al., 2016). 3.3. Identification of highly probable lncRNAs Using the index score outlined in the methodology, the lncRNAs are arranged in descending order and given in supplementary Table 1. The table reveals that a total of 1101 lncRNAs were having positive index scores with expression value > 0.5, highly probable lncRNAs (HPlncRNAs). In addition a total 5599 lncRNA having expression value < 0.5, less probable lncRNA (LPlncRNAs) were found. On the

3.4. Identification of miRNAs As miRNAs are conserved throughout the plant species, the transcripts were searched against the miRBase of release 21(ftp://mirbase. 115

Gene 667 (2018) 112–121

S. Sahu et al.

miRNA sequences of miRBase database. Similarly, our results are also showing higher percentage of A (40.91%) and U (36.36%) as starting nucleotides of mature miRNA sequences in contrast to G (22.73%) and C (0%). The most stable miRNA is cte_miR1134 with MFE −72.6 kcal/ mol (Fig. 5). 3.5. Identification of lncRNAs as candidate endogenous target mimics (eTMs) of miRNAs Our results showed that few miRNAs with minimum fold energy (Table 1) are binding to the lncRNAs with complementary base pairing. Accordingly, we have predicted eTMs using Wu et al. (2013) parameters. These putative eTMs are Ct_lnc_1644, Ct_lnc_5534, Ct_lnc_11423, Ct_lnc_7639 and Ct_lnc_1751. The Ct_lnc_1644 and Ct_lnc_5534 acted as targets for the conserved miRNA - mir824 while Ct_lnc_11423, Ct_lnc_7639 and Ct_lnc_1751 acted as targets for mir5640, mir7780p and mir168 respectively (Figs. 6 and 7). The mir824 is highly conserved and targets MADS-box transcription factor in Arabidopsis (Rajagopalan et al., 2006). Whereas miR168 acts as a negative-feedback mechanism for controlling expression of Argonaute (AGO1) (Rhoades et al., 2002). Further, AGO1 is essential for leaf development and auxiliary shoot meristem formation in Arabidopsis. Thus, in a similar way, we expect the involvement of mir824 and mir168 mechanisms in cluster bean. Besides, Fan et al. (2015) identified eTMs in degradome data of maize and they observed that lncRNA disrupted the miRNA-mRNA regulation. They found 34 lncRNAs acting as targets for 33 miRNAs involved in regulation of various mRNAs. In a similar way, we identified five putative eTMs from cluster bean transcriptome data. The identified eTMs from the present study are predicted as the targets of miRNAs based on competing endogenous RNA (ceRNA) hypothesis (Wu et al., 2013; Zhang et al., 2014, Wang et al., 2015). We feel that the identified eTMs target specific miRNAs (mir168, mir824) in a type of target mimicry to protect the target mRNAs (AGO1 and MADS-box transcription factor) from regulation in cluster bean. Thus, we feel that eTMs may disrupt the miRNA-mRNA regulations, i.e., mir824-MADS-box and mir168-AGO1 regulations. In order to show the likely disruption, we have calculated the expression of miRNAs (mir168, mir824) and the targeted mRNAs (AGO1 and MADS-box transcription factor) from our transcriptome data. The results showed that the expressions of the genes MADS-box and AGO 1, in terms of FPKM values, are 101.96 and 303.7 respectively. On the other hand the

Fig. 4. Categorization of HPlncRNA and LPlncRNA in cluster bean. a) Lengthwise categorization. b) Categorization on the basis of FPKM values.

org/pub/mirbase/CURRENT/) using blastn program (Altschul et al., 1990) with e-value < 0.001 and found 105 hits satisfying the said criteria. A total of 52 best pri-miRNAs were filtered using triplet SVM tool with favourable free energy less than −40 kcal/mol. Pri-miRNAs sequences were submitted to MatureByes tool (Gkirtzou et al., 2010) for the identification of mature miRNA and found 22 most stable candidate miRNAs (Table 1). Ambros rule has been applied for the nomenclature of cluster bean miRNAs (Ambros et al., 2003). The average %GC content and MFE of miRNAs were found to be 43.28 and − 50.27 kcal/mol respectively. The percentage of starting nucleotides A, G, C and U of mature miRNA sequences in miRBase database are 23.87, 11.26, 16.64 and 48.22 respectively. This shows that the percentage of starting nucleotides with G and C is quite low in comparison to A and U in mature Table 1 List of putative miRNAs found on cluster bean genome showing %GC and MFE. Putative e candidate miRNA

Start position

Mature 5′ stem

GC%

MFE

cte_miR5084 cte_miR168 cte_miR8685 cte_miR8785 cte_miR5640 cte_miR8713 cte_miR531 cte_miR7748a-5p cte_miR6183 cte_miR04 cte_miR824-3p cte_miR117 cte_miR3442-5p cte_miR128-p cte_miRR7772-3p cte_miR8741 cte_miR8577 cte_miR5644 cte_miR1134 cte_miR7780-3p cte_miR7758-5p cte_miR168

20 5 1 11 31 19 9 11 5 13 12 15 34 18 8 22 19 13 32 29 24 7

UUACCGUAUUGCAGGUGGGCCU UUUCUUACCCUGCACCACCACC GGAGCACAUCCCAAAAGCCAGA AGAAGUGACGAGUCCGAACUCG AGACAGAGAGAGUUGGACUUGG AUUACGGUUUCGGUGUUUCGGG GAUGCUUCCUGUUGCAGCAGAU GUCCAACAUCGCGGUAAAGAGA UGAUGGUGUUGGCAGCAGAAGA AUUAAUCCUGGUGGUUGCAAGG UCUUCUCCUCCAUCUUCCUUCU AGACGCCAUUGAAGAGCUUGUA GCCAUGAUUCGAUAGUGACUUG AUCAGACUUGUGUUGAUCCUUG AGGCCUCCUUGAAAUAAAAAGC UAGCAAUGUUGAUGGUGAUGAC AAGAGACUCAGGUUACUUCAAC UAGUACCAUUAGAUCUGAGACG UGUAGAGACGAAUGAACCUCAA AAAGGACCCUUAAUGAAAGCUU UACUACUAUUACUUGGUUCUGA GUCAUAGACCAAAUAUUGUUUC

52.17391304 52.17391304 52.17391304 52.17391304 47.82608696 47.82608696 47.82608696 47.82608696 47.82608696 43.47826087 43.47826087 43.47826087 43.47826087 39.13043478 39.13043478 39.13043478 39.13043478 39.13043478 39.13043478 34.7826087 30.43478261 30.43478261

−54.9 −45.4 −51.1 −48 −42.7 −58.8 −42.5 −51.7 −57.8 −46 −52.37 −44.5 −64 −49 −59.8 −49.5 −41.9 −43.7 −72.6 −44.9 −44 −40.9

116

Gene 667 (2018) 112–121

S. Sahu et al.

Fig. 5. Top ten most stable putative miRNAs of cluster bean showing MFE.

that eTMs disrupt the miRNA-mRNA regulations. The results showed that the expressions of the miRNAs mir824 and mir168, in terms of FPKM values, are 1.63 and 19.54 respectively whereas the expressions of the genes MADS-box and AGO 1 are 16.96 and 48.58 respectively.

3.6. miRNAs and lncRNAs containing SSRs The presence of junk DNA, duplications and repeats has led to high rate of evolution in the eukaryotic genomes (Joy et al., 2013). Even though SSRs are evenly distributed throughout the genome, the transcription sites are found to be hot spot regions for SSRs. The formation of SSR is the consequence of replication slippage during gene expression (Li et al., 2002). SSRs exhibit complex pattern in their frequency of occurrence, function, evolution and mutability. We found 6% (3381) of transcripts (Tanwar et al., 2017) are reported to have SSRs. Among them the mono, di, tri, tetra, penta and hexa repeats are 42%, 19%, 35%, 1.8%, 0.2% and 0.2% respectively (Fig.8a). Among the identified SSRs, the (AT) repeats were found as most abundant repeats. The slippage does not occur in trinucleotide repeats (TNRs), however it is common in mono and di nucleotide repeats (Moxon et al., 1994). Trinucleotide repeats are more varied, interesting and biased in genomic distribution (Young et al., 2000). It is evident from the Fig. 8b that highest number of TNRs is AAG. Such finding is generally found in plant genomes (Gupta et al., 1996). Out of 3381 transcripts bearing SSRs, a total of 556 lncRNAs bear/contain SSRs and 8 pre-miRNAs contain SSRs (Supplementary Table 2).

Fig. 6. Predicted base pairing interactions between Endogenous target mimic (eTM) lncRNAs (red color) and miRNA (green color). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

expressions of the miRNAs mir824 and mir168 are 16.02 and 19.05 respectively. This seems that the expressions of miRNAs are low (as miRNAs bind with the lncRNAs) whereas they are high in case of mRNAs. In addition, we have considered another transcriptome data [Leaf tissue of cluster bean Cultivar: RGC-936; SRR5428802] to show

Fig. 7. The interaction between lncRNA-miRNA shown as secondary structure of lncRNAs with binding sites of miRNAs. Blue color shows binding site of miRNA on lncRNA. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

117

Gene 667 (2018) 112–121

S. Sahu et al.

Fig. 10. Functional annotation of targeted genes of cluster bean lncRNAs.

3.7. Target prediction, annotation and enrichment analysis of lncRNAs LncTar (Li et al., 2014c) was applied to observe the interaction between hub of lncRNAs and hub of coding genes (mRNA). The lncRNA-mRNA interaction measured by normalized binding free energy (ndG) with cutoff value ≤ −0.1 was used as standard approach to find whether the interaction is present or not (Florez-Zapata et al., 2016). The target accuracy over 80% from LncTar is normally confirmed by

Fig. 8. Simple sequence repeat analysis in cluster bean. a) Distribution of various type repeats. b) Distribution of trinucleotide SSR.

Fig. 9. Representation of predicted interaction between lncRNA and expressed gene (EST). The ellipse and rectangular nodes represent lncRNAs and expressed gene respectively.

118

Gene 667 (2018) 112–121

S. Sahu et al.

3.9. Evolutionarily conserved lncRNAs

Table 2 Transposable elements in lncRNAs and miRNAs. Items

lncRNA

Pre-miRNA

#Sequences Length (bp) GC (%) Bases masked #Retroelements LINEs:L1/CIN4 LTR(Ty1/Copia) LTR(Gypsy/DIRS1) DNA transposons Hobo-Activator Tourist/Harbinger Small RNA Simple repeats Low complexity Total interspersed repeats (bp)

11,516 3,219,295 43.34 63,033 bp 162 (28,584 bp) 6 (530 bp) 97 (17,450 bp) 59 (10,604 bp) 12 (1290 bp) 3 (376 bp) 1 (59 bp) 2 (129 bp) 648 (25,116 bp) 158 (7837 bp) 29,951

104 19,205 43.62 715 bp 0 0 0 0 0 0 0 1 (218 bp) 16 (497 bp) 0 0

The miRNAs, among the small noncoding RNAs, are highly conserved in plant species (Sunkar and Jagadeeswaran, 2008), However lncRNAs are also reported as diverse throughout the course of evolution (Wang et al., 2004). The putative lncRNAs were searched against CANTATAdb with e-value < 1.0e−10 to check the evolutionary relationship with lncRNAs of other plant species. As expected, merely 54 (0.4% of 11,516) lncRNAs were found conserved. Among them, cluster bean lncRNAs are closer to Glycine max (Fig. 11). The analysis showed that the majority of the lncRNAs of cluster bean are specific to it. Our finding corroborates with that reported by Wang et al. (2004), i.e. the mutation rate in plant lncRNA is very high and rapid showing their diverse evolution (Marques and Ponting, 2014). 3.10. CbLncRNAdb The online portal will be useful for the users to retrieve the lncRNA sequences of cluster bean on the basis of its physical properties like length, expression value, cpc index, ORF, exon from the menu “lncRNAs” of the database. In order to make the database user-friendly, a help manual has also been provided. A list of HPlncRNAs can be retrieved from the menu “HPlncRNA” of the database with details listed under “lncRNAs”. The “Statistics” menu provides the information of SSR distribution, High Probable and Less Probable lncRNAs, Repetitive Elements of lncRNAs and miRNAs and targets & functions of significant lncRNAs. 4. Conclusion The present study describes the in silico characterization and identification of lncRNAs, miRNAs and eTMs. The analysis indicated the involvement of lncRNAs in gene regulatory networks of abiotic stress tolerance, disease resistance and cell wall synthesis. The PCA based index proposed by us can help in the identification of effective HPlncRNAs for further wet lab validation. Besides, we have identified lncRNAs and miRNAs bearing SSRs, which can be used as markers for breeding in cluster bean improvement programme. Our study also finds the involvement of lncRNAs in the synthesis of galactomannan gum in cluster bean that is important for textile industry. We also found eTMs complementarily mimicking the miRNAs that putatively target Argonaute (AGO1) and MADS-box transcription factors. A database named as CbLncRNAdb has also been developed and made publicly available at http://cabgrid.res.in/cblncrnadb that will help enable the researchers working in the area of ncRNAs and cluster bean crop. Supplementary data to this article can be found online at https:// doi.org/10.1016/j.gene.2018.05.027.

Fig. 11. Conserved cluster bean lncRNAs in different plant species.

the biological experiments. It has been proved that lncRNAs regulate a number of important biological processes by interacting with their target genes. We found that only 244 lncRNAs have participated in lncRNA-mRNA interaction and cytoscape was used to draw the network (Fig. 9). The network suggested that most of the lncRNAs are targeting the abiotic stress related genes like GLYSO peroxygenase and GLYSO BAG family molecular chaperone regulator (Fig. 10). The network model has shown the role of lncRNA-mRNA-interactions in DNA damage repair, cell wall synthesis, disease resistance and resistance against microbial pathogens.

Conflict of interest statement 3.8. Characterization of transposable elements in lncRNAs and miRNAs

The authors declare no competing financial interest.

In the present study, TEs were found in the lncRNAs and primiRNAs (Table 2) However, very few TE-derived miRNAs were found in cluster bean transcriptome data, as the whole genome data of cluster bean is not available yet. In general, TE-derived miRNAs are common in animals but very few in plants (Li et al., 2011). Kelley and Rinn (2012) reported that there is an evolutionary relationship between lncRNAs and transposable elements. Further, they showed that TEs have significantly shape the noncoding transcriptome. We observed in our study that the TE-derived miRNAs from transcriptome data are in line with that observed by Li et al. (2011) from rice, Arabidopsis and wheat transcriptome data. However, the TE derived miRNAs are more from the whole genome data including the intronic sequences (Li et al., 2011). Similarly, Sun et al. (2012) found a large number of TE derived miRNAs from whole genome data of crops like Sorghum bicolor and Populus trichocarpa.

Authors and contributors ARR and TM conceived the study; SS and ARR designed the study and developed the index; SS, SG, KG collected and analyzed data; ARR, JP developed web server; SS, ARR, KG, SG and TM drafted and finalized the manuscript: All authors read and approved the final version of the manuscript. Funding This study was supported by the grants ICAR-Consortia Research Platform on Genomics (CRP-Genomics/IX/2017) and CABin Scheme Network project on Agricultural Bioinformatics and Computational Biology (F.No. Agril.Edn. 14/2/2017-A&P dated 02.08.2017), received from Indian Council of Agricultural Research (ICAR). The funding body played no role in design or conclusion of this study. 119

Gene 667 (2018) 112–121

S. Sahu et al.

References

Kapusta, A.l., Feschotte, C.d., 2014. Volatile evolution of long noncoding RNA repertoires: mechanisms and biological implications. Trends Genet. 30 (10), 439–452. Kelley, D., Rinn, J., 2012. Transposable elements reveal a stem cell-specific class of long noncoding RNAs. Genome Biol. 13 (11), R107. Khemka, N., Singh, V.K., Garg, R., Jain, M., 2016. Genome-wide analysis of long intergenic non-coding RNAs in chickpea and their potential role in flower development. Sci. Rep. 6, 33297. Kidner, C.A., Martienssen, R.A., 2005. The developmental role of microRNA in plants. Curr. Opin. Plant Biol. 8 (1), 38–44. Kong, L., Zhang, Y., Ye, Z.-Q., Liu, X.-Q., Zhao, S.-Q., Wei, L., et al., 2007. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35 (Suppl. 2), W345–W349. Lakhotia, S.C., 2016. Non-coding RNAs have key roles in cell regulation. Proc. Indian Natl. Sci. Acad. 82 (4), 1171–1182. Langmead, B., Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9 (4), 357–359. Li, Y.C., Korol, A.B., Fahima, T., Beiles, A., Nevo, E., 2002. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol. Ecol. 11 (12), 2453–2465. Li, Y., Li, C., Xia, J., Jin, Y., 2011. Domestication of transposable elements into microRNA genes in plants. PLoS One 6 (5), e19212. Li, A., Zhang, J., Zhou, Z., 2014a. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinf. 15 (1), 311. Li, L., Eichten, S.R., Shimizu, R., Petsch, K., Yeh, C.-T., Wu, W., et al., 2014b. Genomewide discovery and characterization of maize long non-coding RNAs. Genome Biol. 15 (2), R40. Li, J., Ma, W., Zeng, P., Wang, J., Geng, B., Yang, J., et al., 2014c. LncTar: a tool for predicting the RNA targets of long noncoding RNAs. Brief. Bioinform. 16 (5), 806–812. Liu, J., Jung, C., Xu, J., Wang, H., Deng, S., Bernad, L., et al., 2012. Genome-wide analysis uncovers regulation of long intergenic noncoding RNAs in Arabidopsis. Plant Cell 24 (11), 4333–4345. Marques, A.C., Ponting, C.P., 2014. Intergenic lncRNAs and the evolution of gene expression. Curr. Opin. Genet. Dev. 27, 48–53. Mercer, T.R., Dinger, M.E., Mattick, J.S., 2009. Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 10 (3), 155–159. Misganaw, A., Abera, S., 2017. Genetic diversity assessment of Guzoita abyssinica using EST derived simple sequence repeats (SSRs) markers. African J. Plant Sci. 11 (4), 79–85. Moxon, E.R., Rainey, P.B., Nowak, M.A., Lenski, R.E., 1994. Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr. Biol. 4 (1), 24–33. Mudgil, D., Barak, S., Khatkar, B.S., 2014. Guar gum: processing, properties and food applications — a review. J. Food Sci. Technol. 51 (3), 409–418. Oliveira, E.J., Pádua, J.G., Zucchi, M.I., Vencovsky, R., Vieira, M.L.c.C, 2006. Origin, evolution and genome distribution of microsatellites. Genet. Mol. Biol. 29 (2), 294–307. Ponting, C.P., Oliver, P.L., Reik, W., 2009. Evolution and functions of long noncoding RNAs. Cell 136. http://dx.doi.org/10.1016/j.cell.2009.02.006. Rajagopalan, R., Vaucheret, H., Trejo, J., Bartel, D.P., 2006. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev. 20 (24), 3407–3425. Rhoades, M.W., Reinhart, B.J., Lim, L.P., Burge, C.B., Bartel, B., Bartel, D.P., 2002. Prediction of plant microRNA targets. Cell 110 (4), 513–520. Rodriguez, A., Griffiths-Jones, S., Ashurst, J.L., Bradley, A., 2004. Identification of mammalian microRNA host genes and transcription units. Genome Res. 14 (10a), 1902–1910. Stanke, M., Steinkamp, R., Waack, S., Morgenstern, B., 2004. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32 (suppl_2), W309–W312. Sun, J., Zhou, M., Mao, Z., Li, C., 2012. Characterization and evolution of microRNA genes derived from repetitive elements and duplication events in plants. PLoS One 7 (4), e34092. Sunkar, R., Jagadeeswaran, G., 2008. In silico identification of conserved microRNAs in large number of diverse plant species. BMC Plant Biol. 8 (1), 37. Tang, W., Zheng, Y., Dong, J., Yu, J., Yue, J., Liu, F., et al., 2016. Comprehensive transcriptome profiling reveals long noncoding RNA expression and alternative splicing regulation during fruit development and ripening in kiwifruit (Actinidia chinensis). Front. Plant Sci. 7. Tanwar, U.K., Pruthi, V., Randhawa, G.S., 2017. RNA-seq of guar (Cyamopsis tetragonoloba, L. Taub.) leaves: de novo transcriptome assembly, functional annotation and development of genomic resources. Front. Plant Sci. 8. Wang, J., Zhang, J., Zheng, H., Li, J., Liu, D., Li, H., et al., 2004. Mouse transcriptome: neutral evolution of ‘non-coding’ complementary DNAs. Nature 431 (7010). Wang, T.-Z., Liu, M., Zhao, M.-G., Chen, R., Zhang, W.-H., 2015. Identification and characterization of long non-coding RNAs involved in osmotic and salt stress in Medicago truncatula using genome-wide high-throughput sequencing. BMC Plant Biol. 15 (1), 131. Mu, C., Wang, R., Li, T., Li, Y., Tian, M., Jiao, W., et al., 2016. Long non-coding RNAs (lncRNAs) of sea cucumber: large-scale prediction, expression profiling, non-coding network construction, and lncRNA-microRNA-gene interaction analysis of lncRNAs in Apostichopus japonicus and Holothuria glaberrima during LPS challenge and radial organ complex regeneration. Mar. Biotechnol. 18 (4), 485–499. Wang, Y., Li, X., Tao, B., 2016. Improving classification of mature microRNA by solving class imbalance problem. Sci. Rep. 6. Wilusz, J.E., Sunwoo, H., Spector, D.L., 2009. Long noncoding RNAs: functional surprises from the RNA world. Genes Dev. 23 (13), 1494–1504. Wu, H.-J., Ma, Y.-K., Chen, T., Wang, M., Wang, X.-J., 2012. PsRobot: a web-based plant small RNA meta-analysis toolbox. Nucleic Acids Res. 40 (W1), W22–W28.

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215 (3), 403–410. Ambros, V., Bartel, B., Bartel, D.P., Burge, C.B., Carrington, J.C., Chen, X., et al., 2003. A uniform system for microRNA annotation. RNA 9 (3), 277–279. Andrews, S., 2010. FastQC: A Quality Control Tool for High Throughput Sequence Data. (Software). Bhullar, G.S., Bhullar, N.K., 2012. Agricultural Sustainability: Progress and Prospects in Crop Research, 1st ed. Academic Press. Boerner, S., McGinnis, K.M., 2012. Computational identification and functional predictions of long noncoding RNA in Zea mays. PLoS One 7 (8), e43047. Bolger, A.M., Lohse, M., Usadel, B., 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30 (15), 2114–2120. Cong, L., Ran, F.A., Cox, D., Lin, S., Barretto, R., Habib, N., et al., 2013. Multiplex genome engineering using CRISPR/Cas systems. Science 339 (6121), 819–823. Denman, R.B., 1993. Using RNAFOLD to predict the activity of small catalytic RNAs. BioTechniques 15 (6), 1090–1095. Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., Djebali, S., Tilgner, H., et al., 2012. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22 (9), 1775–1789. Ding, J., Lu, Q., Ouyang, Y., Mao, H., Zhang, P., Yao, J., et al., 2012. A long noncoding RNA regulates photoperiod-sensitive male sterility, an essential component of hybrid rice. Proc. Natl. Acad. Sci. 109 (7), 2654–2659. Eddy, S.R., 2001. HMMER: Profile Hidden Markov Models for Biological Sequence Analysis. Edwards, M.E., Dickson, C.A., Chengappa, S., Sidebottom, C., Gidley, M.J., Reid, J.S., 1999. Molecular characterisation of a membrane-bound galactosyltransferase of plant cell wall matrix polysaccharide biosynthesis. Plant J. 19 (6), 691–697. Faik, A., Price, N.J., Raikhel, N.V., Keegstra, K., 2002. An Arabidopsis gene encoding an αxylosyltransferase involved in xyloglucan biosynthesis. Proc. Natl. Acad. Sci. 99 (11), 7797–7802. Fan, C., Hao, Z., Yan, J., Li, G., 2015. Genome-wide identification and functional analysis of lincRNAs acting as miRNA targets or decoys in maize. BMC Genomics 16 (1), 793. Farazi, T.A., Juranek, S.A., Tuschl, T., 2008. The growing catalog of small RNAs and their association with distinct Argonaute/Piwi family members. Development 135 (7), 1201–1214. Florez-Zapata, N.M.V., Reyes-Valdes, M.H., Martinez, O., 2016. Long non-coding RNAs are major contributors to transcriptome changes in sunflower meiocytes with different recombination rates. BMC Genomics 17 (1), 490. Franco-Zorrilla, J.M., Valli, A.n., Todesco, M., Mateos, I., Puga, M.a.I., Rubio-Somoza, I., et al., 2007. Target mimicry provides a new mechanism for regulation of microRNA activity. Nat. Genet. 39 (8), 1033. German, M.A., Pillay, M., Jeong, D.-H., Hetawal, A., Luo, S., Janardhanan, P., et al., 2008. Global identification of microRNA-target RNA pairs by parallel analysis of RNA ends. Nat. Biotechnol. 26 (8), 941. Gkirtzou, K., Tsamardinos, I., Tsakalides, P., Poirazi, P., 2010. MatureBayes: a probabilistic algorithm for identifying the mature miRNA within novel precursors. PLoS One 5 (8), e11843. Greilhuber, J., Dolezel, J., Lysak, M.A., Bennett, M.D., 2005. The origin, evolution and proposed stabilization of the terms ‘genome size’ and ‘C-value’ to describe nuclear DNA contents. Ann. Bot. 95 (1), 255–260. Griffiths-Jones, S., Grocock, R.J., Van Dongen, S., Bateman, A., Enright, A.J., 2006. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 34 (Suppl. 1), D140–D144. Gupta, P.K., Balyan, H.S., Sharma, P.C., Ramesh, B., 1996. Microsatellites in plants: a new class of molecular markers. Curr. Sci. 70 (1). Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., Couger, M.B., Eccles, D., Li, B., Lieber, M., Macmanes, M.D., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C.N., Henschel, R., Leduc, R.D., Friedman, N., Regev, A., 2013. De novo transcript sequence 719 reconstruction from RNA-seq using the Trinity platform for reference 720 generation and analysis. Nat. Protoc. 8, 1494–1512. Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., et al., 2013a. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8 (8), 1494–1512. Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., et al., 2013b. De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity. Nat. Protoc. 8 (8). Hao, Z., Fan, C., Cheng, T., Su, Y., Wei, Q., Li, G., 2015. Genome-wide identification, characterization and evolutionary analysis of long intergenic noncoding RNAs in cucumber. PLoS One 10 (3), e0121800. Heo, J.B., Sung, S., 2011. Vernalization-mediated epigenetic silencing by a long intronic noncoding RNA. Science 331 (6013), 76–79. Hymowitz, T., Upadhya, M.D., 1963. The chromosome number of Cyamopsis serrata Schinz. Curr. Sci. 32 (9), 427–428. Jones-Rhoades, M.W., Bartel, D.P., Bartel, B., 2006. MicroRNAS and their regulatory roles in plants. Annu. Rev. Plant Biol. 57. http://dx.doi.org/10.1146/annurev.arplant.57. 032905.105218. Joy, N., Soniya, E.V., 2012. Identification of an miRNA candidate reflects the possible significance of transcribed microsatellites in the hairpin precursors of black pepper. Funct. Integr. Genomics 12 (2), 387–395. Joy, N., Asha, S., Mallika, V., Soniya, E.V., 2013. De novo transcriptome sequencing reveals a considerable bias in the incidence of simple sequence repeats towards the downstream of ‘pre-miRNAs’ of black pepper. PLoS One 8 (3), e56694.

120

Gene 667 (2018) 112–121

S. Sahu et al.

in Arabidopsis. Cell 126 (6), 1189–1201. Zhang, Z., Zhu, Z., Watabe, K., Zhang, X., Bai, C., Xu, M., et al., 2013. Negative regulation of lncRNA GAS5 by miR-21. Cell Death Differ. 20 (11), 1558–1568. Zhang, Y.-C., Liao, J.-Y., Li, Z.-Y., Yu, Y., Zhang, J.-P., Li, Q.-F., et al., 2014. Genome-wide screening and functional analysis identify a large number of long noncoding RNAs involved in the sexual reproduction of rice. Genome Biol. 15 (12), 512. Zhu, J., Fu, H., Wu, Y., Zheng, X., 2013. Function of lncRNAs and approaches to lncRNAprotein interactions. Sci. China Life Sci. 56 (10), 876–885. Zhu, B., Yang, Y., Li, R., Fu, D., Wen, L., Luo, Y., et al., 2015. RNA sequencing and functional analysis implicate the regulatory role of long non-coding RNAs in tomato fruit ripening. J. Exp. Bot. 66 (15), 4483–4495.

Wu, H.-J., Wang, Z.-M., Wang, M., Wang, X.-J., 2013. Widespread long noncoding RNAs as endogenous target mimics for microRNAs in plants. Plant Physiol. 161 (4), 1875–1884. Xie, C., Yuan, J., Li, H., Li, M., Zhao, G., Bu, D., et al., 2014. NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Res. 42 (D1), D98–D103. Young, E.T., Sloan, J.S., Van Riper, K., 2000. Trinucleotide repeats are clustered in regulatory genes in Saccharomyces cerevisiae. Genetics 154 (3), 1053–1068. Young, R.S., Marques, A.C., Tibbit, C., Haerty, W., Bassett, A.R., Liu, J.-L., et al., 2008. Identification and properties of 1,119 candidate lincRNA loci in the Drosophila melanogaster genome. Genome Biol. Evol. 4 (4), 427–442. Zhang, X., Yazaki, J., Sundaresan, A., Cokus, S., Chan, S.W.L., Chen, H., et al., 2006. Genome-wide high-resolution mapping and functional analysis of DNA methylation

121