Computational Biology and Chemistry 33 (2009) 62–70
Contents lists available at ScienceDirect
Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem
Research Article
Computational identification of novel microRNA homologs in the chimpanzee genome Vesselin Baev ∗,1 , Evelina Daskalova 1 , Ivan Minkov Department of Plant Physiology and Molecular Biology, University of Plovdiv, 24 Tsar Assen St, 4000 Plovdiv, Bulgaria
a r t i c l e
i n f o
Article history: Received 27 May 2008 Accepted 16 July 2008 Keywords: miRNA Pan troglodytes Orthologs Bioinformatics
a b s t r a c t MicroRNAs are important negative regulators of gene expression in higher eukaryotes. The miRNA repertoire of the closest human animal relative, the chimpanzee (Pan troglodytes), is largely unknown. In this study, we focused on computational search of novel miRNA homologs in chimpanzee. We have searched and analyzed the chimp homologs of the human pre-miRNA and mature miRNA sequences. Based on a homology search of the chimpanzee genome with human miRNA precursor sequences as queries, we identified 639 chimp miRNA genes, including 529 novel chimp miRNAs. 91.8% of chimp mature miRNAs and 60.3% of precursors are 100% identical to their human orthologs. The pre-miRNA secondary structures, miRNA families, and clusters are also highly conserved. We also found certain sequence differences in premiRNAs and even mature miRNAs that occurred after the divergence of the two species. Some of these differences (especially in mature miRNAs) could have caused species-specific changes in the expression levels of their target genes which in turn could have resulted in phenotypic variation between human and chimp. © 2008 Elsevier Ltd. All rights reserved.
1. Introduction MicroRNAs (miRNAs) are small endogenous 21–22 nt ncRNAs, negative regulators of gene expression. They are found in the genomes of all multicellular eukaryotes analyzed so far. Many animal miRNA genes are phylogenetically conserved; however, there are also many non-conserved, species-specific miRNAs that might have been important in the emergence of phenotypic variation in closely related species. miRNA genes are transcribed into primary miRNA (pri-miRNA) by RNA polymerase II. The pri-miRNAs are then processed into ∼70 nt pre-miRNAs with a specific hairpin structure by a microprocessor complex (Cai et al., 2004; Lee et al., 2002, 2004). The pre-miRNAs are then transported into the cytoplasm, where the RNAase III enzyme, Dicer, cleaves them into ∼22 nt mature miRNAs. Each mature miRNA is loaded into a protein complex called the RNA induced silencing complex (RISC), which guides the miRNA to its regulatory targets. In nearly all known cases, these targets are located in 3 untranslated regions (3 UTRs) of cellular mRNAs (Ambros, 2004). If there is a perfect match between the miRNA and its mRNA target, the latter is subjected to degradation; if the match is imperfect, the protein translation from the target is
∗ Corresponding author. Tel.: +359 894380945; fax: +359 629495. E-mail address:
[email protected] (V. Baev). 1 Equally contributed authors. 1476-9271/$ – see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2008.07.024
blocked (Bartel, 2004; Cuellar and McManus, 2005). A gene may have several target sites for same or different miRNAs, and a single miRNA can interact with more than a hundred genes (Miranda et al., 2006). This is the basis for the emergence of highly interconnected and complex miRNA-based regulatory networks. This has opened completely new opportunities for gene regulation in multicellular eukaryotes. Experimental studies reveal that miRNAs play important roles in development, signalling, apoptosis, cell fate and differentiation. Deregulation of these small RNAs can trigger tumor initiation and progression by switching inappropriate molecular programs conducting uncontrolled proliferation. Recent studies have provided strong evidence for miRNA regulation of many essential oncogenes, including BCL2, RAS, MYC, p53 (Cimmino et al., 2005; EsquelaKerscher and Slack, 2006; Grosshans et al., 2005; Johnson et al., 2005; Koscianska et al., 2007; O’Donnell et al., 2005). Some miRNAs are present in the genome as clusters where two or more miRNAs occupy neighbouring positions within a few kilobases of each other and are transcribed as a polycistronic structure. These polycistronic miRNAs may have cooperative function (LagosQuintana et al., 2001, 2003). According to Altuvia et al. (2005) at a 10 kb distance threshold, up to 48% of known human miRNA genes form clusters in the human genome. Many vertebrate miRNA clusters are highly conserved (Altuvia et al., 2005). Human miRNAs are relatively well studied. To date, 678 human miRNAs have been discovered, and the functions of some of them have been computationally predicted and/or experimentally
V. Baev et al. / Computational Biology and Chemistry 33 (2009) 62–70
validated, indexed in the most prominent miRNA-centered database, miRBase (http://www.microrna.sanger.ac.uk/; GriffithsJones, 2004; Griffiths-Jones et al., 2008). In contrast, most primate miRNA lists seem to be far from complete. For example, until very recently only 71 miRNAs from the rhesus monkey (Macaca mulatta) were registered in miRBase. Nevertheless, while the rhesus monkey is a model organism for studying human physiology and its genome is completely sequenced, there is no data on rhesus miRNA function so far. A recent study takes thorough bioinformatics-based search for novel miRNA homologs in the rhesus genome and reports total 454 miRNAs, including the 71 previously deposited (Yue et al., 2008). The miRNA repertoire of the closest human animal relative, the chimpanzee, is also largely unknown. Investigation of chimpanzee miRNAs will expand our understanding of their biological function and can clarify the evolution in closely related species of these small RNA molecules. To date, only 100 miRNAs from chimpanzee (Pan troglodytes) and 89 miRNAs from bonobo (Pan paniscus) are deposited in mirBase, most of them originating from the study of Berezikov et al (Berezikov et al., 2005, 2006). The chimpanzee genome was completely sequenced in 2005 (Chimpanzee Sequencing and Analysis Consortium, 2005), but there is almost no data about the expression and function of chimp miRNAs. In this study, we focused on computational search for novel miRNA homologs in chimpanzee. We have searched and analyzed the chimp homologs of the human pre-miRNA and mature miRNA sequences and have identified hundreds of 100% and near 100% identical miRNAs. We have also analyzed all human miRNA clusters and confirmed the presence of all but one of them in the chimpanzee genome. The 100% and near 100% identity determined in this study allowed us to predict that the newly discovered chimp orthologs are functional. Additionally, we have noted some cases of divergence between miRNA repertoires of the two genomes. Some of these differences may play a role in the human-chimp phenotypic divergence. 2. Materials and Methods 2.1. Human miRNA Sequences Human mature and pre-miRNA sequences were obtained from the Welcome Trust Sanger Institute’s miRBase, release 11 (http://www.microrna.sanger.ac.uk; Griffiths-Jones et al., 2008). The sequences were stored in multi-fasta format and used for further analyses.
63
of the genome. The sequences were downloaded in one fasta file per chromosome from the FTP UCSC server (http://www. hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/). Some files (named chrN random.fa) contain clones which are either not yet finished, or cannot be placed with certainty at a specific location on the current N chromosome. 2.3. Extracting Putative miRNA Precursor Regions from Chimpanzee Genome The chimpanzee genome was subjected to BLAT analysis (Kent, 2002). A stable version of the BLAT tool was downloaded from the UCSC site (http://www.genome-test.cse.ucsc.edu/∼kent/exe/ linux/). We used the local linux distribution of the BLAT tool. From the BLAT results, a BED file was created, including coordinates for all homology regions. The structure of the BED file was as described in the UCSC site (http://www.genome.ucsc.edu/FAQ/FAQformat) and contains: chromosome number, start and end position of the region, name of the region, score and strand. The BED file was uploaded into Galaxy 2 tool, linked to the PanTro2 genome, and the sequences were fetched in FASTA format (Giardine et al., 2005). 2.4. Predicting Secondary Structures of Putative Chimp Precursors and Human Pre-miRNAs All precursor sequences were piped to a Perl script which uses as an external process the ViennaRNA secondary structure prediction software by Ivo Hofacker, version 1.5 (http://www.tbi. univie.ac.at/RNA/; Hofacker, 2003). The output files of the script were image PS files and a CSV file containing several features: the sequence name, sequence, secondary structure (in dot-and-bracket format) of the sequence and free energy. Since the different software tools for RNA secondary structure prediction have slight output differences, we performed additional de novo 2D analysis of all human pre-miRNAs. This simultaneous folding facilitated the manual comparison of the structures from both species. All the entries that did not form hairpin structures were discarded. The threshold free energy for the precursor sequences was −25 kcal/mol, but we also manually examined regions with higher energy when energy values were larger than the human precursor ortholog. For the graphical representation of miRNA precursors we used web-based Mfold (http://www.frontend.bioinfo.rpi.edu/ applications/mfold/cgi-bin/rna-form1.cgi). (Zuker, 2003)
2.2. Chimpanzee Genome 2.5. BLAST Analyses The chimpanzee genome was downloaded from the UCSC Genome Browser website (http://www.genome.ucsc.edu; Karolchik et al., 2008). We used Build 2 Version 1, Oct. 2005
To check for the presence of mature miRNAs in precursor sequences, a BLAST analysis was performed (we used BLAST
Table 1 miRNA genes in the chimpanzee genome Category
Novel miRNA genes
Reported miRNA genes
Human pre-miRNAs with no chimp ortholog
Number
Number
Number
100% identical pre-miRNA 85–99.9% identical pre-miRNA 100% identical mature miRNAb
325 214 495
Total
539
a
Percent 60.29 39.71 91.84 100
Percent
75 25 97
75 25 97
– – –
100
100
39
Percenta – – – 5.75
The percent miRNAs with no chimp ortholog is calculated based on the number 678 (all human miRNAs deposited in miRBase). b This category includes all 325 miRNA genes with 100% identical precursors, plus 170 of the 214 precursors. In these 170 pre-miRNAs, the region of the mature miRNA is 100% identical with the human ortholog.
64
V. Baev et al. / Computational Biology and Chemistry 33 (2009) 62–70
Fig. 1. Human miRNA clusters miR-17/92, miR-23/27 and miR-15/16 in the UCSC genome browser and chimp region conservation (green). Clusters are highly conserved between human and chimpanzee genome. The locations of the three clusters shown in human genome are: chr13:90,800,809-90,801,696, chr19:13,808,050-13,808,523 and chr3:161,605,019-161,605,357 respectively.
instead the BLAT tool, because BLAT can find only sequences longer than 20 nt). A local copy of BLAST was obtained from NCBI FTP server (ftp://ftp.ncbi.nih.gov/blast/). We stored all records showing 100% identity with a human mature miRNA, as well as sequences of homolog precursors that lacked 100% identity with the mature miRNA (see Supplementary Table A, column “altered miRNA”)
2.6. miRNA Cluster Analysis The information about miRNA gene clusters in the human genome was obtained via miRGen cluster resourse (http://www. diana.pcbi.upenn.edu/cgi-bin/miRGen/v3/Cluster.cgi). This webtool allows the use of a variable input parameter for the distance threshold, so clusters with different length and density of included
V. Baev et al. / Computational Biology and Chemistry 33 (2009) 62–70
65
Table 2 Three miRNA clusters in the chimpanzee genome (1000 nt proximity) and their identity to human clusters Chromosome
Direction
Cluster
miRNAs included
In chimpa
Identityb
4
(−)
hsa-mir-367-302
hsa-mir-367 hsa-mir-302a hsa-mir-302b hsa-mir-302c hsa-mir-302d
Yes Yes Yes Yes Yes
100 100 100 100 100
13
(+)
hsa-mir-17-92-1
hsa-miR-17 hsa-miR-18a hsa-miR-19a hsa-miR-19b-1 hsa-miR-20a hsa-miR-92-1
Yes Yes Yes Yes Yes Yes
100 100 100 100 100 100
14
(+)
hsa-mir-380-329-2
hsa-mir-380 hsa-mir-323 hsa-mir-758 hsa-mir-329-1 hsa-mir-329-2
Yes Yes Yes Yes Yes
100 100 100 100 100
a b
Presence of the miRNA gene in the chimpanzee genome. Percent identity of the chimp pre-miRNA with the human ortholog.
miRNA genes can be obtained. In this analysis, we choose threshold value of 1000 nt. 3. Results 3.1. miRNA Genes in the Chimpanzee Genome We performed a BLAT search of the chimpanzee genome against published human pre-miRNA sequences in the miRBase (GriffithsJones et al., 2008) in order to find potential miRNA orthologous genes. The criteria for the chimpanzee orthologs to be reported as miRNA genes were as follows: (1) More than 85% identity with human pre-miRNA sequences. (2) Chimp pre-miRNA orthologs to contain 100% of the human mature miRNA. Our dataset contained all cases of 100% identical mature miRNAs, as well as cases where the precursor has proper stem-loop structure and the mature miRNA region is slightly altered (contains single nucleotides changes). (3) Identical or similar secondary structures of the pre-miRNA hairpins in human and chimp.
Based on these criteria, we identified 639 putative chimpanzee miRNA genes including all previously reported 100 miRNAs as well as 539 novel pre-miRNA sequences. 39 human pre-miRNAs appeared to have no chimpanzee ortholog. Of the 539 novel chimp miRNA genes, 325 (60.29%) share 100% identity with the human pre-miRNAs, and 214 (39.71%) share between 85% and 99.9% identity with human pre-miRNAs. Of the 100 chimp miRNAs previously identified and reported in miRBase, 75 were 100% identical with their human orthologs, and 25 shared 85–99.9% similarity (Table 1). Most of the putative chimp pre-miRNA sequences with 85–99.9% similarity contained 1, 2, or 3 mismatches compared to the human precursors sequences. The 678 human miRNA gene sequences used to search the chimp genome are available from miRBase. Of the discovered miRNAs, 286 are at orthologous positions in introns of protein-coding genes (intronic miRNA genes). The rest are located in other genomic regions, most of them being intergenic. 3.2. Secondary Structure Analysis In order to analyze the secondary structure conservation, we folded all human and chimp orthologous genes using ViennaRNA and Mfold software. Interestingly, the secondary structure
Table 3 Two chimp miRNA families. “Start” and “end” columns indicate the position of the pre-miRNA in the genome; “mature start” and “mature end” - the location of the mature miRNA within the precursor miRNA family Let-7 Ptr-let-7a-1 Ptr-let-7a-2 Ptr-let-7a-3 Ptr-let-7c Ptr-let-7d Ptr-let-7e Ptr-let-7f-1 Ptr-let-7f-2 Ptr-let-7g Ptr-let-7i Ptr-mir-98 Mir-15 Ptr-mir-15a Ptr-mir-15b Ptr-mir-16-1 Ptr-mir-16-2 Ptr-mir-195
Chr.
Start
End
Strand
Mature start
Mature end
Mature sequence
chr9 chr11 chr22 chr21 chr9 chr19 chr9 chrX chr3 chr12 chrX
93394665 121076348 45296680 16639743 93397542 57336391 93395055 53984503 53524893 26907195 53983535
93394744 121076419 45296753 16639826 93397628 57336469 93395141 53984585 53524976 26907278 53983653
+ − + + + + + − − − −
5 5 3 10 7 7 6 8 5 6 22
26 26 24 31 28 28 27 29 26 27 43
UGAGGUAGUAGGUUGUAUAGUU UGAGGUAGUAGGUUGUAUAGUU UGAGGUAGUAGGUUGUAUAGUU UGAGGUAGUAGGUUGUAUGGUU AGAGGUAGUAGGUUGCAUAGUU UGAGGUAGGAGGUUGUAUAGUU UGAGGUAGUAGAUUGUAUAGUU UGAGGUAGUAGAUUGUAUAGUU UGAGGUAGUAGUUUGUACAGUU UGAGGUAGUAGUUUGUGCUGUU UGAGGUAGUAAGUUGUAUUGUU
chr13 chr3 chr13 chr3 chr17
49957953 165480590 49957807 165480747 7177857
49958035 165480687 49957895 165480827 7177943
− + − + −
14 19 14 9 15
35 40 35 30 35
UAGCAGCACAUAAUGGUUUGUG UAGCAGCACAUCAUGGUUUACA UAGCAGCACGUAAAUAUUGGCG UAGCAGCACGUAAAUAUUGGCG UAGCAGCACAGAAAUAUUGGC
66
V. Baev et al. / Computational Biology and Chemistry 33 (2009) 62–70
3.3. miRNA Gene Clusters in the Chimpanzee Genome miRNA genes often form clusters in the genome (LagosQuintana et al., 2001, 2003). Clusters can be defined as miRNA genes present in the same orientation and are transcribed in one polycistronic transcriptional unit. At a 1000 nt distance threshold, the human genome contains 58 miRNA clusters with 143 miRNAs. These dense clusters are localized on chromosomes 1, 3, 4, 5, 7, 9, 11, 12 13, 14, 17, 19, 20, 21, 22 and X. Using the miRGen cluster resource (http://www.diana.pcbi.upenn.edu/cgibin/miRGen/v3/Cluster.cgi), we downloaded these 58 human miRNA clusters and analyzed their chimp homologs. The orthologous miRNA clusters in the chimp genome were almost identical to these in human (Fig. 1). Three such examples are shown in Table 2. Of the 143 miRNAs that are co-localized in clusters at a 1000 nt threshold, 7 miRNAs are missing, 113 are 100% identical to their human orthologs. The remaining 23 have identity between 92 and 99.8% (Supplementary Table B). Of the 58 miRNA gene clusters analyzed, all but one exist at a homologous position in the chimp genome (Supplementary Table B; Fig. 1). The only exception is the hsa-mir-132-212 cluster on chromosome 17, in which two miRNAs could not be identified at orthologous positions in the chimp genome. Five other miRNA cluster members at a 1000 nt threshold in human could not be identified in chimp: ptr-mir-200b, 200c, 222, 363, and 767. As discussed below, the most probable reason is the lower coverage of these regions by the chimp sequencing project. The sequence conservation in these regions of the human genome with rhesus, mouse and rat is considerably high, as indicated on Fig. 1. 3.4. Chimpanzee miRNA Gene Families
Fig. 2. Compensatory change in stem element within orthologous miRNA precursors in human and chimp.
appeared to be even more conserved than the sequence. In almost all of the cases of divergence between human and chimp in premiRNA, and even in the mature miRNA sequences, the secondary structure remains similar. Some reasons and examples are analyzed in section 4.2.
Animal miRNAs are usually grouped as gene families. A typical miRNA family contains several mature miRNAs with identical or almost identical sequence (Bartel, 2004; Houbaviy et al., 2005; Li and Mao, 2007). In most cases, at least one member of the family has been experimentally identified, while the others have often been inferred as bioinformatics predictions based on sequence homology of the miRNA and the ability of the surrounding sequence to form hairpin structure (Ambros et al., 2003; Zhang et al., 2006). Often, there are some differences in the ends of miRNAs belonging to one family. Generally it is not known whether these differences affect miRNA function. To date, miRBase family classification contains 556 groups of miRNAs containing different species members. All known human miRNAs belong to 285 families (some of them have only one
Fig. 3. Changes in mir-1181 sequence out of the mature miRNA region that do not affect secondary structure.
V. Baev et al. / Computational Biology and Chemistry 33 (2009) 62–70
67
Fig. 4. Human and chimp mir-606. The mature miRNA region is flanked in red. The nucleotide difference within the mature sequences does not affect the whole structure of the pre-miRNA.
member), and all 100 previously reported chimp miRNAs belong to 63 of them. In the chimp genome, we found homologs for 276 human families (285 except 9; for more information, see Supplementary Table A and Table 3). In our searches, we could not find any of the 4 members of miR-941 family. The other 8 missing families contain a single miRNA. There are also five chimp families with one missing member (miR-92b, miR-888, miR-548o, miR-212, miR-222) and one family with two missing members (miR-200, with miR-200b and miR-200c missing). 3.5. Divergent Chimp Precursors and miRNA Genes The prevailing number (60.29%) of newly discovered chimp miRNA genes shares 100% identity with their human orthologs. The rest of the chimp miRNA genes have 1–12 mismatches in their sequence compared to human. We examined whether the changes in chimp miRNA genes fall into the region of the mature miRNA sequence in other regions of the precursor. Of the 214 novel miRNA genes, 170 (79.44%) have 1–12 mismatches in the pre-miRNA, but the mature miRNA is 100% identical to the human ortholog. In summary, 495 of the chimp
mature miRNAs (91.84% of all miRNAs discovered in our analysis) have 100% identity with their human orthologs. Interestingly, the nucleotide changes in the precursor did not destroy the stem-loop structure of the precursor RNA. The 100% mature miRNA sequence identity with human suggests that these 170 genes may well be functional in chimpanzee. The remaining 44 genes have one or more mismatches in the mature miRNA. Three of the 25 experimentally validated chimp miRNAs previously reported in miRBase and confirmed in our study (ptr-mir-216a, ptr-mir-508 and ptr-mir-513b) also have single nucleotide changes in the region of the mature miRNA. Therefore these 44 genes with altered mature miRNAs are also likely to be functional. Interestingly, the divergence between miRNA clusters is higher on chromosome X. Three of the five “missing” miRNAs and 6 of the 23 diverged miRNAs are localized on this chromosome. 3.6. Are There Unique Human miRNA Genes? In our study we found 39 miRNA genes that have no chimp ortholog at the pre-miRNA level. We checked whether the “missing
68
V. Baev et al. / Computational Biology and Chemistry 33 (2009) 62–70
Fig. 5. Comparison of human, chimp and rhesus let-7b precursors.
chimp miRNAs” are present in the rhesus genome. Fifteen of them are reported in miRBase for the rhesus genome, and 24 are missing in its genome as well. The 24 miRNAs that are missing in both chimp and rhesus genomes are: hsa-mir-200b, hsa-mir-483, hsa-mir-548o, hsa-mir-574, hsa-mir-629, hsa-mir-647, hsa-mir-659, hsa-mir769, hsa-mir-941-1, hsa-mir-941-2, hsa-mir-941-3, hsa-mir-941-4, hsa-mir-1180, hsa-mir-1228, hsa-mir-1231, hsa-mir-1238, hsa-mir1243, hsa-mir-1257, hsa-mir-1270, hsa-mir-1274a, hsa-mir-1301, hsa-mir-1302-5, hsa-mir-1304, and hsa-mir-1308. These miRNAs have a higher probability of being novel for the human lineage. 4. Discussion 4.1. miRNAs in Human and Chimp are Very Similar Based on a homology search of the chimpanzee genome with human miRNA precursor sequences as queries, we identified 639 chimp miRNA genes, including 529 novel chimp miRNAs. They belong to the same families and are arranged in orthologous clusters with orthologous positions in the chimp as in human genome. All of the miRNA genes, families and clusters are highly conserved between chimp and human. 60.29% of pre-miRNA sequences, as well as 91.84% of mature miRNAs are 100% identical to their human orthologs. Logically, the chimp precursors that are 100% identical to human percursors form the same secondary structures. The degree of conservation is comparable to that of protein-coding exons, and in many cases is even higher. This high similarity between the miRNA repertoires of the two genomes confirms close evolutionary relationship between them, as supported by the sequencing data (Chimpanzee Sequencing and Analysis Consortium, 2005). The high level of identity between human and chimp miRNA
revealed in the current study in principle allows the functions of some chimp miRNAs to be predicted and will contribute to better understanding of the molecular evolution of our closest animal relative. 4.2. The Divergence and its Hypothetical Consequences We also identified several differences between human and chimp in certain pre-miRNAs and even mature miRNAs. Some of the diverged miRNAs in chimp may have partially or completely lost their function as negative regulators of gene expression. As a result, some of their targets may have elevated expression levels in chimp compared to human. In other cases this could have happened in the human lineage, causing elevated expression in human genes compared to chimp. If fixed, these altered expression levels could result in phenotypic variation between the two species. It is also possible that some of the mutations in either lineage have increased the functionality of the miRNA (for example, by increasing its complementarity to the target mRNA). This would also result in altered levels of gene expression and phenotypic differences between human and chimp. As it emerged in recent years, major differences in phenotypes may be due to changes in regulatory sequences rather than in protein coding exons (Pennisi, 2004). If this is true for the miRNA-based regulations, even slight differences in these regulatory sequences could well have been contributed to phenotypic differences which are difficult to explain based on the highly identical proteomes of human and chimp. Apart from identical precursors, we have found 288 putative chimp pre-miRNA sequences for 170 miRNA genes that have 1–14 nucleotide changes only in precursor region, leaving the mature
V. Baev et al. / Computational Biology and Chemistry 33 (2009) 62–70
part 100% identical to human. These nucleotide changes can be divided in two categories: changes that do not affect the secondary structure of the precursor and changes that modify the structure. In some cases, such nucleotide changes have stabilized the folding, for example by transforming some loops/bulges to stem elements (mir-1283 in Fig. 2). It should be noted that the reason for the identity in the structures in most cases of sequence difference is that these nucleotide alterations are part of unpaired RNA elements (loops and bulges). Interestingly, there are also some identical orthologous structures that have sequence difference within paired regions of RNA. We can explain this preservation of structure by eventual compensatory evolutionary changes (Fig. 3). We found 88 putative precursor sequences for 44 miRNA genes where the sequence alteration was within the mature miRNA region, making it unique for chimpanzee. Moreover, in some cases the nucleotide change did not affect the precursor structure, which leads to the hypothesis that these chimp sequences may represent new miRNA family members or they are orthologs that have undergone evolutionary changes (Fig. 4). All these changes could have arisen after the divergence between the chimp and human about 6,000,000 years ago. After the divergence of these species, certain changes have occurred independently in the chimp lineage while others have occurred in human. Interestingly, the latter should be the case of let-7b precursor of human, chimp and rhesus, where the difference appears only in human—the change has occurred independently in the human lineage (Fig. 5). Except for the sequence differences in particular miRNAs, there is a divergence between miRNA clusters. Interestingly, the divergence seems to be higher on chromosome X. Three of the five “missing” miRNAs and 6 of the 23 diverged miRNAs are localized on this chromosome. Recently, Zhang et al. (2007) also reported a fast evolving X-linked miRNA cluster, supporting the idea that miRNAs on this chromosome evolve faster than those on autosomes. In our study we found 39 miRNA genes that have no chimp ortholog at the pre-miRNA level. The low sequencing coverage of the chimpanzee genome in some regions could be the simplest mechanistic explanation for these “missing” miRNAs. However, we cannot exclude the possibility that these miRNAs have appeared independently in the human lineage (or, alternatively, disappeared in the chimp lineage) after the divergence of the two species. This is more probable for 24 of the 39 “missing” miRNAs. These 24 miRNAs cannot be found among the miRBase-deposited rhesus miRNA genes, which make them possible candidates for novel humanspecific miRNA genes. As chimp miRNA genes were identified based on comparison to their human orthologs, it is very probable some novel chimpanzeespecific miRNAs that have appeared after the divergence between human and chimp also exist but remain unidentified in our analysis. 4.3. Perspectives: Evolution of the Target Sites miRNA genes are generally highly conservative in evolution, however, computational miRNA target predictions indicate that many lineage-specific miRNA binding sites exist. It seems that evolution of the target sites is much faster than the evolution of miRNAs. Recent studies reveal that 30–50% of non-conserved miRNA binding sites in the human genome might be functional when the mRNA and miRNA are expressed in the same tissue (Chen and Rajewsky, 2007). Therefore the rapid evolution of target sites, even if there is 100% identity between miRNAs, could also contribute to the establishment of species-specific expression and have phenotypic consequences.
69
Acknowledgements We thank Molly Megraw from the Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA, USA, for critical reading of the manuscript. This work is supported by the Bulgarian National Science Fund 2007/2008. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.compbiolchem.2008.07.024. References Altuvia, Y., Landgraf, P., Lithwick, G., Elefant, N., Pfeffer, S., Aravin, A., Brownstein, M.J., Tuschl, T., Margalit, H., 2005. Clustering and conservation patterns of human microRNAs. Nucleic Acids Res. 33, 2697–2706. Ambros, V., 2004. The functions of animal microRNAs. Nature 431, 350–355. Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., Jewell, D., 2003. MicroRNAs and other tiny endogenous RNAs in C. elegans. Curr. Biol. 13, 807–818. Bartel, D.P., 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281–297. Berezikov, E., Guryev, V., van de Belt, J., Wienholds, E., Plasterk, R.H., Cuppen, E., 2005. Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120, 21–24. Berezikov, E., Thuemmler, F., van Laake, L.W., Kondova, I., Bontrop, R., Cuppen, E., Plasterk, R.H., 2006. Diversity of microRNAs in human and chimpanzee brain. Nat. Genet. 38, 1375–1377. Cai, X., Hagedorn, C.H., Cullen, B.R., 2004. Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. RNA 10, 1957–1966. Chen, K., Rajewsky, N., 2007. The evolution of gene regulation by transcription factors and microRNAs. Nat. Rev. Genet. 8, 93–103. Chimpanzee Sequencing Analysis ConsortiumAT Initial sequence of the chimpanzee genome and comparison with the human genome, 2005. Nature 437, 69–87. Cimmino, A., Calin, G.A., Fabbri, M., Iorio, M.V., Ferracin, M., Shimizu, M., Wojcik, S.E., Aqeilan, R.I., Zupo, S., Dono, M., et al., 2005. miR-15 and miR-16 induce apoptosis by targeting BCL2. Proc. Natl. Acad. Sci. U.S.A. 102, 13944–13949. Cuellar, T.L., McManus, M.T., 2005. MicroRNAs and endocrine biology. J. Endocrinol. 187, 327–332. Esquela-Kerscher, A., Slack, F.J., 2006. Oncomirs–microRNAs with a role in cancer. Nat. Rev. Cancer 6, 259–269. Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., et al., 2005. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455. Griffiths-Jones, S., 2004. The microRNA Registry. Nucleic Acids Res. 32, D109–111. Griffiths-Jones, S., Saini, H.K., van Dongen, S., Enright, A.J., 2008. miRBase: tools for microRNA genomics. Nucleic Acids Res. 36, D154–158. Grosshans, H., Johnson, T., Reinert, K.L., Gerstein, M., Slack, F.J., 2005. The temporal patterning microRNA let-7 regulates several transcription factors at the larval to adult transition in C. elegans. Dev. Cell 8, 321–330. Hofacker, I.L., 2003. Vienna RNA secondary structure server. Nucleic Acids Res. 31, 3429–3431. Houbaviy, H.B., Dennis, L., Jaenisch, R., Sharp, P.A., 2005. Characterization of a highly variable eutherian microRNA gene. RNA 11, 1245–1257. Johnson, S.M., Grosshans, H., Shingara, J., Byrom, M., Jarvis, R., Cheng, A., Labourier, E., Reinert, K.L., Brown, D., Slack, F.J., 2005. RAS is regulated by the let-7 microRNA family. Cell 120, 635–647. Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., Giardine, B., Harte, R.A., Hinrichs, A.S., Hsu, F., et al., 2008. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36, D773–779. Kent, W.J., 2002. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664. Koscianska, E., Baev, V., Skreka, K., Oikonomaki, K., Rusinov, V., Tabler, M., Kalantidis, K., 2007. Prediction and preliminary validation of oncogene regulation by miRNAs. BMC Mol. Biol. 8, 79. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., Tuschl, T., 2001. Identification of novel genes coding for small expressed RNAs. Science 294, 853–858. Lagos-Quintana, M., Rauhut, R., Meyer, J., Borkhardt, A., Tuschl, T., 2003. New microRNAs from mouse and human. RNA 9, 175–179. Lee, Y., Jeon, K., Lee, J.T., Kim, S., Kim, V.N., 2002. MicroRNA maturation: stepwise processing and subcellular localization. EMBO J. 21, 4663–4670. Lee, Y., Kim, M., Han, J., Yeom, K.H., Lee, S., Baek, S.H., Kim, V.N., 2004. MicroRNA genes are transcribed by RNA polymerase II. EMBO J. 23, 4051–4060. Li, A., Mao, L., 2007. Evolution of plant microRNA gene families. Cell Res. 17, 212–218. Miranda, K.C., Huynh, T., Tay, Y., Ang, Y.S., Tam, W.L., Thomson, A.M., Lim, B., Rigoutsos, I., 2006. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell 126, 1203–1217. O’Donnell, K.A., Wentzel, E.A., Zeller, K.I., Dang, C.V., Mendell, J.T., 2005. c-Mycregulated microRNAs modulate E2F1 expression. Nature 435, 839–843. Pennisi, E., 2004. Searching for the genome’s second code. Science 306, 632–635.
70
V. Baev et al. / Computational Biology and Chemistry 33 (2009) 62–70
Yue, J., Sheng, Y., Orwig, K.E., 2008. Identification of novel homologous microRNA genes in the rhesus macaque genome. BMC Genom. 9. Zhang, B.H., Pan, X.P., Cox, S.B., Cobb, G.P., Anderson, T.A., 2006. Evidence that miRNAs are different from other RNAs. Cell Mol. Life Sci. 63, 246–254.
Zhang, R., Peng, Y., Wang, W., Su, B., 2007. Rapid evolution of an X-linked microRNA cluster in primates. Genome Res. 17, 612–617. Zuker, M., 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415.