JOURNAL OF
GENETICS AND GENOMICS J. Genet. Genomics 35 (2008) 349−355
www.jgenetgenomics.org
Genome-wide mapping of conserved microRNAs and their host transcripts in Tribolium castaneum Qibin Luo a, b, 1, Qing Zhoub, c, 1, Xiaomin Yu b, c, Hongbin Lin b, c, Songnian Hu a, b, Jun Yu a, b, * a
b
James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310008, China Key Laboratory of Genome Information and Sciences, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China c Graduated School of the Chinese Academy of Sciences, Beijing 100049, China Received for publication 29 October 2007, Revised 18 March 2008; Accepted 19 March 2008
Abstract MicroRNAs (miRNAs) are endogenous 22-nt RNAs, which play important regulatory roles by post-transcriptional gene silencing. A computational strategy has been developed for the identification of conserved miRNAs based on features of known metazoan miRNAs in red flour beetle (Tribolium castaneum), which is regarded as one of the major laboratory models of arthropods. Among 118 putative miRNAs, 47% and 53% of the predicted miRNAs from the red flour beetle are harbored by known protein-coding genes (intronic) and genes located outside (intergenic miRNA), respectively. There are 31 intronic miRNAs in the same transcriptional orientation as the host genes, which may share RNA polymerase II and spliceosomal machinery with their host genes for their biogenesis. A hypothetical feedback model has been proposed based on the analysis of the relationship between intronic miRNAs and their host genes in the development of red flour beetle. Keywords: miRNA; host transcript; intronic miRNA; Tribolium castaneum
Introduction MicroRNAs (miRNAs) have been found in a wide range of species from animals to plants (Bartel, 2004; Kim and Nam, 2006). There are two pathways involved in processing miRNAs. In animals, the primary transcripts (pri-miRNAs) consisting of one or more hairpin precursor sequences (pre-miRNAs), are either transcribed together with their host genes or independently within the nucleus (Bartel, 2004; Lee et al., 2004; Borchert et al., 2006). Pri-miRNAs are subsequently cleaved into short sequences ranging from 60 to 130 nucleotides in length by Drosha. Pre-miRNAs are then exported to the cytosol where the mature miRNAs are further processed into 18–23 nucleotides in length, cleaved by Dicer (Lee et al., 2003; Bartel, * Corresponding author. Tel/Fax: +86-01-8049-8676. E-mail address:
[email protected] 1 These authors contributed equally to this work.
2004). Since the discovery of the very first miRNA, several investigators have contributed to exploiting functions of miRNAs in plants, nematodes, insects, and vertebrates (Bartel, 2004; Kim and Nam, 2006). Recent studies have suggested that miRNAs regulate gene expression posttranslationally by perfectly or imperfectly complementing to 3′ untranslated region (3′-UTR) of target mRNAs (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001). The complement miRNAs are cleaved by RNA-induced silencing complex (RISC) to suppress target genes (Bartel, 2004). Based on intensive research on the model organisms, including human, Arabidopsis, Drosophila, and Caenorhabditis elegans, thousands of miRNAs have been identified through the experimental or/and computational approaches (Lagos-Quintana et al., 2001; Grad et al., 2003; Lai et al., 2003; Adai et al., 2005; Bentwich et al., 2005; Berezikov et al., 2005; Lindow and Krogh, 2005; Weber, 2005). Most identifications of miRNAs have been
350
Qibin Luo et al. / Journal of Genetics and Genomics 35 (2008) 349−355
focused on their mature forms that have length and sequence conservations as well as the hairpin structure in pre-miRNA (Yoon and de Micheli, 2006). However, it becomes complicated to identify pre-miRNAs as they lack sequence conservation (Yoon and de Micheli, 2006). As a result, several computational tools have been developed to predict pre-miRNAs based on the hairpin structure, such as miRscan, miRseeker, and various machine learning methods (Yoon and de Micheli, 2006; Yousef et al., 2006; Yan et al., 2007). The complete genome sequence of red flour beetle, Tribolium castaneum, provides an opportunity for us to identify its novel miRNAs (Brown et al., 2003). As the second model insect, T. castaneum has several typical characteristics that are similar to the majority of arthropods, especially when one of them is well-studied— Drosophila melanogaster (Brown et al., 2003). In this study, using a computational strategy, we predicted 118 putative miRNAs and further identified their host transcripts, and classified the host transcripts into two types according to their locations—residing inside or outside known transcripts, so the transcriptional units that carry primary miRNAs may be transcribed by either RNA polymerase II or III. We not only identified 31 intronic miRNAs and their 22 host genes but also surveyed the expression of those intronic miRNAs based on EST (expressed sequence tag) data.
Materials and methods
Fig. 1. Flow chat describing procedures for predicting miRNAs in T. castaneum (Tca) genome.
the sliding window size from 60-nt to 120-nt and compared the folding energy of each sequence within the window with each other in the same candidate miRNA sequence. Finally, we identified the sequences with minimum folding energy as candidate hairpins with careful manual inspection.
Data set We prepared three data sets for this study: 1) the reference miRNA sequence that contains 3,858 mature metazoan miRNAs from miRNA Registry Database (Release 10.0, August 2007; http://microrna.sanger.ac.uk) (Ambros et al., 2003; Griffiths-Jones, 2004; Griffiths-Jones et al., 2006); 2) the genome sequence, and 3) the ESTs of T. castaneum from the GenBank at National Center for Biotechnology Information (NCBI). Computational miRNA predictions The detailed procedure of identifying putative miRNAs in T. castaneum is summarized in Fig. 1. We first aligned a reference miRNAs data set against the T. castaneum genome sequence by using BLASTN with word-size 7 and gap penalty –1. We identified miRNAs with a minimal of less than 3-nt mismatches per sequencing pair. We extracted candidate miRNAs (~220 bp) with 100-nt upstream and downstream sequences as the input data after the removal of repetitive and protein-encoding sequences. The input data of miRNA candidates were further folded by RNAfold (Denman, 1993) to predict hairpins. We adjusted
Results Prediction of putative miRNAs in T. castaneum We used our computational procedures to scan the whole genome data for the identification of putative miRNA homologs based on a reference miRNA data set. As the first phase of miRNA prediction, we set a relatively stringent parameter allowing no more than 2-nt mismatches. As the length of pre-miRNA (precursor of miRNA) varied mostly from 50-nt to 130-nt among Metazoan species, we extracted candidate miRNA sequences with 100-nt upstream and downstream sequences of each match. To find out the appropriate length of pre-miRNA, we applied a variable sliding window to scan along each candidate miRNA sequence. Each scan created one potential pre-miRNA sequence that was named hairpin seed. We finally identified appropriate hairpins after successfully passing the sequences through the second phase of our protocol. We used the following computational criteria for screening the hairpin seeds: 1) pre-miRNA sequences can fold into an appropriate hairpin secondary structure; 2) the
Qibin Luo et al. / Journal of Genetics and Genomics 35 (2008) 349−355
length of hairpin seeds should range from 50 bp to 130 bp; 3) putative mature miRNA must be located along the stem and only one loop in each hairpin is allowed; 4) miRNA precursors in secondary structures should have high negative minimal free energies (at most –15 kcal/mol) in the same type of hairpin seeds. We further examined manually all the hairpin seeds that satisfied the above criteria and removed the candidate precursors with low folding energy or unreasonable hairpin structures. In the final phase, we manually selected the most suitable hairpin candidates according to their secondary structure conservation in the same family and named them as putative miRNAs. The putative miRNAs were annotated based on miRBase definition when available, and the genomic coordinates for each occurrence of mature and putative precursor miRNA sequences within the T. castaneum, location relative to coding sequence (CDS) of the T. castaneum official gene set (intergenic, intronic, or overlapping a CDS), the folding energies, and the genome sequences of the putative mature miRNA and putative hairpin region are all summarized in Supplemental Table.
Table 1 Intronic miRNAs in T. castaneum genome
351
MicroRNA
Scaffold
Host_gene (CDS)
Location
tca-miR-12
TcaLG3
655429
Intron 5
tca-miR-283 tca-miR-200* tca-miR-8 tca-miR-929 tca-miR-79 tca-miR-277
TcaLG3 TcaLG9 TcaLG9 TcaLGUn_WGA184_1 TcaLGUn_WGA231_1 TcaLG7
655429 656966 656966 657060 658323 658676
Intron 5 Intron 20 Intron 20 Intron 2 Intron 2 Intron 3
tca-miR-317 tca-miR-34 tca-miR-507b-2 tca-miR-792 tca-miR-190 tca-miR-18
TcaLG7 TcaLG7 TcaLG8 TcaLG2 TcaLG7 TcaLG9
658676 658676 659404 659838 661162 661743
Intron 3 Intron 3 Intron 1 Intron 8 Intron 16 Intron 5
tca-miR-33a tca-miR-932 tca-miR-139 tca-miR-925
TcaLG4 TcaLG5 TcaLG7 TcaLG5
663039 663865 664235 655682
Intron 7 Intron 2 Intron 1 Intron 2
tca-miR-754
TcaLG5
656017
Intron 12
tca-miR-265
TcaLG3
656506
Intron 1
Identification of host transcripts harboring miRNAs
tca-miR-877
TcaLG3
656921
Intron 4
tca-miR-7
TcaLG7
656952
Intron 8
The previous studies demonstrate that most miRNA genes are encoded in intergenic regions, and are transcribed by RNA polymerase II (Pol II) or RNA polymerase III (Pol III) (Lee et al., 2004; Borchert et al., 2006; Ying and Lin, 2006). However, a class of miRNAs, known as intronic miRNAs, is located in introns of protein-coding genes with the same transcription orientation as their host genes (Ying and Lin, 2004, 2005, 2006). On the other hand, several intergenic miRNAs are located in either exons or introns of non-coding transcription units (TUs) and are transcribed with corresponding host transcripts (Rodriguez et al., 2004). Some of them may directly form the primiRNAs subsequently being capped and polyadenylated like other Pol II transcripts (Lee et al., 2004). Based on these studies, we proposed that the host transcripts of miRNAs can be classified as transcription units or host genes. Of our predicted miRNAs, we identified 56 miRNAs located in the protein-coding genes. Considering that the miRNAs that are in reverse direction to respective genes may neither share the same promoter nor be transcribed with these genes, we excluded such miRNAs from the intronic miRNAs and grouped their host transcripts into transcription units. Accordingly, we found 31 miRNAs located in 22 host genes and classified them as intronic miRNAs (Table 1).
tca-miR-71b*-1 TcaLGUn_WGA143_1 657441
Intron 1
Analysis of intronic miRNA expression based on EST database We validated the intronic miRNA expression using 64,571 ESTs from T. castaneum and identified only one
tca-miR-307
TcaLG8
657982
Intron 2
tca-miR-263b
TcaLG3
658026
Intron 4
tca-miR-13a tca-miR-13b tca-miR-2-1 tca-miR-2-2 tca-miR-2-3
TcaLG3 TcaLG3 TcaLG3 TcaLG3 TcaLG3
662075 662075 662075 662075 662075
Intron 2 Intron 2 Intron 2 Intron 2 Intron 2
tca-miR-71a tca-miR-927
TcaLG3 TcaLG9
662075 663217
Intron 2 Intron 1
Asterisk indicates minor miR*-sequences derived.
intronic miRNA, miR-277. The poor detection rate is consistent with the previous findings in other model organisms, such as human, rat, and fruit fly, approximately one miRNA out of 10,000–20,000 ESTs, but also depends on the data quality and quantity (Hubbard et al., 2005; Zhang et al., 2006). We further searched for the expression evidence of other miRNAs and another miRNA (miR-250) was detected. However, miR-250 is located in the intron of a protein-coding gene with reverse orientation so that it may be transcribed with other transcription unit.
Discussion Characterization of miRNAs in T. castaneum The identification procedure allowed us to predict 118 miRNAs. After tracing back to the reference miRNA dataset, we found 65 homologs miRNAs of known insect
352
Qibin Luo et al. / Journal of Genetics and Genomics 35 (2008) 349−355
miRNAs and the rest matched those of other metazoan species. We detected 42 families from the 65 miRNAs based on sequence and structural similarity to precursor miRNA. Of the 42 families, 14 (33%) contain more than one miRNA gene and five families (12%) contain three or more miRNA genes. Among the 118 putative miRNAs, there are 20 families with more than one miRNA gene and
8 families with three or more (Table 2). These miRNA families have been hypothesized as results of historical expansion events such as amplification and diversification, which normally happen in families of protein-coding genes (Prince and Pickett, 2002). Considering that miRNA genes have undergone similar gene duplication and diversification processes resembling protein-coding genes in insect
Table 2 MiRNA families in T. castaneum genome MicroRNA family
MicroRNA
MicroRNA family
MicroRNA
MicroRNA family
MicroRNA
bantam
tca-bantam
mir-263
tca-miR-263a
mir-71
tca-miR-71b-3p-1
bantam
tca-bantam-c
mir-263
tca-miR-263b
mir-71
tca-miR-71b-3p-2
let-7
tca-let-7a
mir-275
tca-miR-275
mir-71
tca-miR-71b-3p-3
let-7
tca-let-7b
mir-276
tca-miR-276-3p
mir-8
tca-miR-200*
let-7
tca-let-7c
mir-276
tca-miR-276-5p
mir-8
tca-miR-8
mir-1
tca-miR-1-1
mir-283
tca-miR-283
mir-87
tca-miR-87a
mir-1
tca-miR-1-2
mir-29
tca-miR-29
mir-87
tca-miR-87b
mir-10
tca-miR-10
mir-290
tca-miR-371-5p
mir-877
tca-miR-877
mir-10
tca-miR-10*
mir-305
tca-miR-305
mir-9
tca-miR-79
mir-12
tca-miR-12
mir-31
tca-miR-31
mir-9
tca-miR-9a
mir-124
tca-miR-124
mir-315
tca-miR-315
mir-9
tca-miR-9b
mir-125
tca-miR-125
mir-317
tca-miR-317
mir-9
tca-miR-9c
mir-130
tca-miR-130b
mir-33
tca-miR-33a
mir-99
tca-miR-100
mir-133
tca-miR-133
mir-34
tca-miR-34
mir-iab-4
tca-miR-iab-4-3p
mir-137
tca-miR-137a
mir-375
tca-miR-375
mir-iab-4
tca-miR-iab-4-5p
mir-137
tca-miR-137b
mir-46
tca-miR-281-3p
none
tca-miR-261b
mir-139
tca-miR-139
mir-46
tca-miR-281-5p
none
tca-miR-262
mir-14
tca-miR-14
mir-467
tca-miR-466a
none
tca-miR-265
mir-146
tca-miR-146
mir-467
tca-miR-466b
none
tca-miR-277
mir-15
tca-miR-195
mir-467
tca-miR-466c-1-3p
none
tca-miR-279a
mir-154
tca-miR-369-5p
mir-467
tca-miR-466c-1-5p
none
tca-miR-279b
mir-17
tca-miR-17a
mir-467
tca-miR-466c-2
none
tca-miR-553a
mir-17
tca-miR-17b
mir-467
tca-miR-466c-3
none
tca-miR-553c
mir-17
tca-miR-18
mir-499
tca-miR-499
none
tca-miR-559
mir-184
tca-miR-184
mir-506
tca-miR-507a
none
tca-miR-562
mir-187
tca-miR-187
mir-506
tca-miR-507c-1
none
tca-miR-620
mir-190
tca-miR-190
mir-506
tca-miR-507c-2
none
tca-miR-699
mir-2
tca-miR-13a
mir-506
tca-miR-507d
none
tca-miR-701
mir-2
tca-miR-13b
mir-506
tca-miR-507e-1
none
tca-miR-737-2
mir-2
tca-miR-2-1
mir-506
tca-miR-507e-2
none
tca-miR-754
mir-2
tca-miR-2-2
mir-515
tca-miR-516b*
none
tca-miR-755
mir-2
tca-miR-2-3
mir-515
tca-miR-522
none
tca-miR-792
mir-2
tca-miR-2b*
mir-540
tca-miR-540
none
tca-miR-798
mir-210
tca-miR-210a
mir-568
tca-miR-568a-1
none
tca-miR-925
mir-210
tca-miR-210b
mir-568
tca-miR-568a-2
none
tca-miR-927
mir-219
tca-miR-219
mir-568
tca-miR-568a-3
none
tca-miR-929
mir-25
tca-miR-92a
mir-568
tca-miR-568c
none
tca-miR-932
mir-25
tca-miR-92b
mir-67
tca-miR-307
none
tca-miR-936
mir-25
tca-miR-92c
mir-7
tca-miR-7
mir-250 tca-miR-250 mir-71 Asterisk indicates minor miR*-sequences derived.
tca-miR-71
Qibin Luo et al. / Journal of Genetics and Genomics 35 (2008) 349−355
genomes, we expected to be able to see different members of the same miRNA family inhabiting among duplicated regions of the genome. However, we did not find any other miRNA family arising from apparent tandem or segmental duplications except the mir-71 family that contains three members owning high similarity in flanking sequences. This result suggested that most of the miRNA genes in the red flour beetle may not evolve into multi-gene families through segmental or tandem duplications. The observation that there are scarce conserved sequences surrounding miRNA members of the same family also indicated that either such duplication events are ancient or the members have been derived via unknown mechanisms. We also noticed that some miRNA genes cluster together sharing the same orientation. This observation is consistent with the fact that within animal species, miRNAs are commonly grouped into clusters in which multiple miRNAs are transcribed at the same time in one large polycistronic unit. In our predicted miRNAs, 13 clusters can be found with each cluster limiting in no more than 10 kb region. The relationships of these clusters as well as their host transcripts and target genes will be discussed in the following parts. The intronic miRNA host genes in T. castaneum The intronic miRNAs are a class of miRNAs that are derived from the processed introns. They have been identified in several species, but the percentage of miRNAs residing in introns varies across species, ranging from 7.4% in zebrafish to 25.3% in human (Li et al., 2007) and 26.3% in red flour beetle (31 out of 118). The miRNAs processed from introns of protein-coding host genes are likely to share their regulatory elements and primary transcripts with their host genes by Pol II, and are then cleaved by two RNase III enzymes, Drosha and Dicer (Bartel, 2004). Considering the possibility of co-expression pattern between intronic miRNAs and their host genes, we first investigated each host gene individually. We found out that miRNAs may have some tendency to reside in genes related to the transcription process. For example, the host genes of miR-33a and miR-754 have transcription factor activity. Moreover, the host genes of miR-7 and miR-925 have functional mRNA and exhibit transcription factor binding activity. We also noticed that several species, including Gallus gallus, Macaca mulatta, Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Danio rerio have embedded mir-190 within different order of introns in talin2 (TLN2) owing to its large size of more than 200 kb and its alternative spliced variants that give rise to multiple transcripts (Monkley et al., 2001). The protein product of mir-190 host gene functions as a linker between integrins and sarcomeric cytoskeletonin adhesion complexes in mature striated muscle, and is inducible during striated muscle differentiation (Senetar et al., 2006).
353
As miRNA host genes encode proteins with a broad spectrum of biological roles ranging from cytoskeleton organization to cell proliferation and signal transduction, to explore a perspective on the classification of protein-coding genes possibly modulated by their coordinated miRNAs, we surveyed all miRNA and their host genes in Gene Ontology (GO) database (Gene Ontology Consortium 2001; http://www.geneontology.org/). The two most commonly identified categories are physiological process (GO: 00050875; 83%) and metabolism (GO: 0008152; 56%). The most common molecular function is protein binding (GO: 0043393; 72%). Recently, Bartel et al. (Ruby et al., 2007) indicated that there is another processing of intronic miRNA, where miRNAs do not arise from canonical miRNA biogenesis pathway but from an alternative pathway through RNA splicing rather than Drosha. These intronic microRNA precursors that bypass the Drosha processing pathway are named mirtron. However, in our survey of intronic miRNAs in T. castaneum, we did not find mirtron, because the shortest intron, which can form hairpin structure is 183 bp, and this length is longer than the classical mirtron. Furthermore, there is also evidence that Drosha may cleave intronic miRNAs between the splicing commitment step and the excision step of mRNA post-transcriptional processing, thereby ensuring both miRNA biogenesis and protein synthesis from a single primary transcript, which conflict with the conventional co-expression model (Kim YK and Kim VN, 2007). It is believed that in T. castaneum, intronic miRNAs and their host genes undergo complex post-transcriptional processing. Clustering of miRNA genes in T. castaneum Since the existence of miRNA clusters has been first noted (Lagos-Quintana et al., 2001; Lau et al., 2001), the prior study has demonstrated that over half of the known Drosophila miRNAs can form clusters (Aravin et al., 2003), although the majority of worm and human miRNA genes lack evidence in support of clustering (Lim et al., 2003a, 2003b). We investigated the locations of all 118 predicted miRNAs and found 11 miRNA clusters in the T. castaneum genome. Among the 11 clusters, 8 were found within protein-coding genes. There is only one case where clustered miRNAs are located in different introns of the same gene. The other 7 clusters were further investigated for their correlation with and regulation on the corresponding target genes in related species. For example, miR-12 and miR-283 cluster together; they also exist as a cluster in honey bee and fruit fly, and the intronic miR-283/12/304 cluster in fruit fly has been demonstrated expressed in embryonic peripheral nervous system (Aboobaker et al., 2005; Weaver et al., 2007). Furthermore, miR-277/miR-317/ miR-34 occurs in the same intron of 658676, a gene simi-
354
Qibin Luo et al. / Journal of Genetics and Genomics 35 (2008) 349−355
lar to CG11246-PA, which is a core component of the RNA polymerase II complex, and these three miRNAs also form clusters in the same pattern in honey bee and fly fruit genomes (Weaver et al., 2007). We identified a cluster containing two highly homologous members of the miR-92 family, miR-92a and miR-92b. In Drosophila, there are also these two miRNAs, and they are located within 5,000-nt from each other on chromosome 3R. It is likely that this cluster arises through recent gene duplication followed by sequence diversification (Aravin et al., 2003; Tanzer and Stadler, 2004). The largest cluster that is composed of six members, miR-13a/miR-13b/miR-2-1/miR-2-2/miR-2-3/miR-71c, is located in the second intron of a gene similar to serine/threonine-protein phosphatase 4 regulatory subunit 1. MiR-13 and miR-2 belong to the miR-2 family, because they only differ by two bases. The miR-2 and miR-13 genes also form a cluster in the genome of Anopheles gambiae and Apis mellifera: the former genome contains four members of the miR-2/miR-13 family in a single cluster (Lai et al., 2003; Behura, 2007). In contrast, Drosophila genomes contain eight members of this family, but are separated at four distinct genomic locations on three different chromosomes (Lai et al., 2003). MiR-2 and miR-13 potentially function in programmed cell death pathway (Wienholds and Plasterk, 2005). The pro-apoptotic genes reaper, grim, and sickle have been identified as the targets of miR-2 family miRNAs, suggesting that miR-2 may be involved in the control of apoptosis (Stark et al., 2003). Furthermore, the knockdown of miR-2 or miR-13 by antisense mediated inactivation in Drosophila embryos resulted in developmental defects (Boutla et al., 2003; Wienholds and Plasterk, 2005). Another interesting phenomenon is that miR-1 and miR-133 conservatively huddle together throughout the phylogenetic tree, with at least 7 species including T. castaneum, Xenopus tropicalis, Gallus gallus, Monodelphis domestica, Homo sapiens, Mus musculus, Rattus norvegicus having this miR-1/miR-133 cluster. Moreover, they are all precisely located on the antisense strand of the same intron (intron 12) of the same gene named mind bomb homolog 1 (MIB1). This is a ubiquitin ligase essential in signaling cells for efficient activation of Notch pathway in neighboring cells through interacting with the intracellular domain of Delta to promote its ubiquitylation and internalization (Itoh et al., 2003). In Drosophila, miR-1 functions in Notch pathway as it targets the Notch ligand, Delta (Kwon et al., 2005). In Xenopus laevis embryos, there is evidence indicating that miR-1 promotes muscle differentiation by targeting histone deacetylase4 (HDAC4) (Chen et al., 2006), while HDAC4 is also involved in Notch signaling pathway acting as a transcriptional repressor. These findings give us hints that the conservative co-localization of the miR-1/miR-133 cluster and its antisense gene may exhibit their coordinated relation-
ships rather than only the result of random arrangement, which cannot maintain this cohabitation pattern over such long evolutionary time. Another polycistronic microRNA cluster, miR-195 and miR-559 are in the same large intron, and they are 20 kb apart from each other, but they may be co-expressed when they are transcribed together by RNA polymerase II. MiR-200* closely resides with miR-8 within a 60-nt space. The function of conservative miRNAs within the same cluster remains to be revealed although transcribed together with their host transcripts by RNA polymerase II/III and processed into pre-miRNAs (Lagos-Quintana et al., 2001; Lau et al., 2001). Based on the above observations, it was reasonable to suggest that these operon-like clusters may play an important role in RNA regulation network. Similar to operon in prokaryote, which can form a negative feedback loop in development and growth, operon-like clusters in eukaryote may have partly preserved this ancient mechanism throughout the evolutionary history. Existence of an alternative mechanism of co-regulation within operon-like clusters in eukaryotes cannot be excluded, though, and further research will be required to verify this hypothesis.
Acknowledgements This work was supported by the National Basic Research Program of China from the Ministry of Science and Technology awarded to Jun Yu and Songnian Hu (No. 2006CB910400).
Supplementary Data Supplementary Table associated with the article can be found in the online version at www.jgenetgenomics.org. References Aboobaker, A.A., Tomancak, P., Patel, N., Rubin, G.M., and Lai, E.C. (2005). Drosophila microRNAs exhibit diverse spatial expression patterns during embryonic development. Proc. Natl. Acad. Sci. USA 102: 18017−18022. Adai, A., Johnson, C., Mlotshwa, S., Archer-Evans, S., Manocha, V., Vance, V., and Sundaresan, V. (2005). Computational prediction of miRNAs in Arabidopsis thaliana. Genome Res. 15: 78−91. Ambros, V., Bartel, B., Bartel, D.P., Burge, C.B., Carrington, J.C., Chen, X., Dreyfuss, G., Eddy, S.R., Griffiths-Jones, S., Marshall, M., Matzke, M., Ruvkun, G., and Tuschl, T. (2003). A uniform system for microRNA annotation. RNA 9: 277−279. Aravin, A.A., Lagos-Quintana, M., Yalcin, A., Zavolan, M., Marks, D., Snyder, B., Gaasterland, T., Meyer, J., and Tuschl, T. (2003). The small RNA profile during Drosophila melanogaster development. Dev. Cell 5: 337−350. Bartel, D.P. (2004). MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 116: 281−297.
Qibin Luo et al. / Journal of Genetics and Genomics 35 (2008) 349−355 Behura, S.K. (2007). Insect microRNAs: Structure, function and evolution. Insect Biochem. Mol. Biol. 37: 3−9. Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, O., Barzilai, A., Einat, P., Einav, U., Meiri, E., Sharon, E., Spector, Y., and Bentwich, Z. (2005). Identification of hundreds of conserved and nonconserved human microRNAs. Nat. Genet. 37: 766−770. Berezikov, E., Guryev, V., van de Belt, J., Wienholds, E., Plasterk, R.H., and Cuppen, E. (2005). Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120: 21−24. Borchert, G.M., Lanier, W., and Davidson, B.L. (2006). RNA polymerase III transcribes human microRNAs. Nat. Struct. Mol. Biol. 13: 1097−1101. Boutla, A., Delidakis, C., and Tabler, M. (2003). Developmental defects by antisense-mediated inactivation of micro-RNAs 2 and 13 in Drosophila and the identification of putative target genes. Nucleic Acids Res. 31: 4973−4980. Brown, S.J., Denell, R.E., and Beeman, R.W. (2003). Beetling around the genome. Genetic Res. 82: 155−161. Chen, J.F., Mandel, E.M., Thomson, J.M., Wu, Q., Callis, T.E., Hammond, S.M., Conlon, F.L., and Wang, D.Z. (2006). The role of microRNA-1 and microRNA-133 in skeletal muscle proliferation and differentiation. Nat. Genet. 38: 228−233. Denman, R.B. (1993). Using RNAFOLD to predict the activity of small catalytic RNAs. Biotechniques 15: 1090−1095. Grad, Y., Aach, J., Hayes, G.D., Reinhart, B.J., Church, G.M., Ruvkun, G., and Kim, J. (2003). Computational and experimental identification of C. elegans microRNAs. Mol. Cell 11: 1253−1263. Griffiths-Jones, S. (2004). The microRNA registry. Nucleic Acids Res. 32: D109−111. Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. (2006). miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 34: D140−144. Hubbard, S.J., Grafham, D.V., Beattie, K.J., Overton, I.M., McLaren, S.R., Croning, M.D., Boardman, P.E., Bonfield, J.K., Burnside, J., Davies, R.M., Farrell, E.R., Francis, M.D., Griffiths-Jones, S., Humphray, S.J., Hyland, C., Scott, C.E., Tang, H., Taylor, R.G., Tickle, C., Brown, W.R., Birney, E., Rogers, J., and Wilson, S.A. (2005). Transcriptome analysis for the chicken based on 19,626 finished cDNA sequences and 485,337 expressed sequence tags. Genome Res. 15: 174−183. Itoh, M., Kim, C.H., Palardy, G., Oda, T., Jiang, Y.J., Maust, D., Yeo, S.Y., Lorick, K., Wright, G.J., Ariza-McNaughton, L., Weissman, A.M., Lewis, J., Chandrasekharappa, S.C., and Chitnis, A.B. (2003). Mind bomb is a ubiquitin ligase that is essential for efficient activation of Notch signaling by Delta. Dev. Cell 4: 67−82. Kim, V.N., and Nam, J.W. (2006). Genomics of microRNA. Trends Genet. 22: 165−173. Kim, Y.K., and Kim, V.N. (2007). Processing of intronic microRNAs. EMBO J. 26: 775−783. Kwon, C., Han, Z., Olson, E.N., and Srivastava, D. (2005). MicroRNA1 influences cardiac differentiation in Drosophila and regulates Notch signaling. Proc. Natl. Acad. Sci. USA 102: 18986−18991. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294: 853−858. Lai, E.C., Tomancak, P., Williams, R.W., and Rubin, G.M. (2003). Computational identification of Drosophila microRNA genes. Genome Biol. 4: R42. Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294: 858−862. Lee, R.C., and Ambros, V. (2001). An extensive class of small RNAs in
355
Caenorhabditis elegans. Science 294: 862−864. Lee, Y., Kim, M., Han, J., Yeom, K.H., Lee, S., Baek, S.H., and Kim, V.N. (2004). MicroRNA genes are transcribed by RNA polymerase II. EMBO J. 23: 4051−4060. Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S., and Kim, V.N. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature 425: 415−419. Li, S.C., Tang, P., and Lin, W.C. (2007). Intronic microRNA: Discovery and biological implications. DNA Cell Biol. 26: 195−207. Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B., and Bartel, D.P. (2003a). Vertebrate microRNA genes. Science 299: 1540. Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., Burge, C.B., and Bartel, D.P. (2003b). The microRNAs of Caenorhabditis elegans. Genes Dev. 17: 991−1008. Lindow, M., and Krogh, A. (2005). Computational evidence for hundreds of non-conserved plant microRNAs. BMC Genomics 6: 119. Monkley, S.J., Pritchard, C.A., and Critchley, D.R. (2001). Analysis of the mammalian talin2 gene TLN2. Biochem. Biophys. Res. Commun. 286: 880−885. Prince, V.E., and Pickett, F.B. (2002). Splitting pairs: The diverging fates of duplicated genes. Nature Rev. 3: 827−837. Rodriguez, A., Griffiths-Jones, S., Ashurst, J.L., and Bradley, A. (2004). Identification of mammalian microRNA host genes and transcription units. Genome Res. 14: 1902−1910. Ruby, J.G., Jan, C.H., and Bartel, D.P. (2007). Intronic microRNA precursors that bypass Drosha processing. Nature 448: 83−86. Senetar, M.A., Moncman, C.L., and McCann, R.O. (2007). Talin2 is induced during striated muscle differentiation and is targeted to stable adhesion complexes in mature muscle. Cell Motil. Cytoskeleton 64: 157−173. Stark, A., Brennecke, J., Russell, R.B., and Cohen, S.M. (2003). Identification of Drosophila microRNA targets. PLoS Biol. 1: E60. Tanzer, A., and Stadler, P.F. (2004). Molecular evolution of a microRNA cluster. J. Mol. Biol. 339: 327−335. Weaver, D.B., Anzola, J.M., Evans, J.D., Reid, J.G., Reese, J.T., Childs, K.L., Zdobnov, E.M., Samanta, M.P., Miller, J., and Elsik, C.G. (2007). Computational and transcriptional evidence for microRNAs in the honey bee genome. Genome Biol. 8: R97. Weber, M.J. (2005). New human and mouse microRNA genes found by homology search. FEBS J. 272: 59−73. Wienholds, E., and Plasterk, R.H. (2005). MicroRNA function in animal development. FEBS Lett. 579: 5911−5922. Yan, X., Chao, T., Tu, K., Zhang, Y., Xie, L., Gong, Y., Yuan, J., Qiang, B., and Peng, X. (2007). Improving the prediction of human microRNA target genes by using ensemble algorithm. FEBS Lett. 581: 1587−1593. Ying, S.Y., and Lin, S.L. (2004). Intron-derived microRNAs-fine tuning of gene functions. Gene 342: 25−28. Ying, S.Y., and Lin, S.L. (2005). Intronic microRNAs. Biochem. Biophys. Res. Commun. 326: 515−520. Ying, S.Y., and Lin, S.L. (2006). Current perspectives in intronic micro RNAs (miRNAs). J. Biomed. Sci. 13: 5−15. Yoon, S., and de Micheli, G. (2006). Computational identification of microRNAs and their targets. Birth Defects Res. C Embryo Today 78: 118−128. Yousef, M., Nebozhyn, M., Shatkay, H., Kanterakis, S., Showe, L.C., and Showe, M.K. (2006). Combining multi-species genomic data for microRNA identification using a Naive Bayes classifier. Bioinformatics 22: 1325−1334. Zhang, B., Pan, X., Cannon, C.H., Cobb, G.P., and Anderson, T.A. (2006). Conservation and divergence of plant microRNA genes. Plant J. 46: 243−259.