Functional evidence of post-transcriptional regulation by pseudogenes

Functional evidence of post-transcriptional regulation by pseudogenes

Biochimie 93 (2011) 1916e1921 Contents lists available at ScienceDirect Biochimie journal homepage: www.elsevier.com/locate/biochi Review Function...

338KB Sizes 2 Downloads 18 Views

Biochimie 93 (2011) 1916e1921

Contents lists available at ScienceDirect

Biochimie journal homepage: www.elsevier.com/locate/biochi

Review

Functional evidence of post-transcriptional regulation by pseudogenes Enrique M. Muro*, Nancy Mah, Miguel A. Andrade-Navarro Max-Delbrück Center for Molecular Medicine (MDC), Robert Rössle Str. 10, 13125 Berlin, Germany

a r t i c l e i n f o

a b s t r a c t

Article history: Received 14 April 2011 Accepted 19 July 2011 Available online 27 July 2011

Pseudogenes have been mainly considered as functionless evolutionary relics since their discovery in 1977. However, multiple mechanisms of pseudogene functionality have been proposed both at the transcriptional and post-transcriptional level. This review focuses on the role of pseudogenes as posttranscriptional regulators. Two lines of research have recently presented strong evidence of their potential function as post-transcriptional regulators of the corresponding parental genes from which they originate. First, pseudogene genomic sequences can encode siRNAs. Second, pseudogene transcripts can act as indirect post-transcriptional regulators decoying ncRNA, in particular miRNAs that target the parental gene. This has been demonstrated for PTEN and KRAS, two genes involved in tumorigenesis. The role of pseudogenes in disease has not been proven and seems to be the next research landmark. In this review, we chronicle the events following the initial discovery of the ‘useless’ pseudogene to its breakthrough as a functional molecule with hitherto unbeknownst potential to influence human disease. Ó 2011 Elsevier Masson SAS. All rights reserved.

Keywords: Pseudogenes ncRNA Natural antisense transcripts siRNA miRNA

1. Introduction Pseudogenes are “genomic loci that resemble real genes, yet are considered to be biologically inconsequential because they harbor premature stop codons, deletions/insertions and frameshift mutations that abrogate their translation into functional proteins” [1]. Pseudogenes originate from gene templates (parental genes) either by retrotransposition of the parental gene’s mRNA (processed pseudogenes that have no introns and, in principle, no upstream DNA regulatory regions) or as the product of genome duplication (non-processed or duplicated pseudogenes, which may contain all the parental gene introns and their upstream DNA regulatory regions). DNA sequences of pseudogenes evolve faster than those of their respective parental genes due to mutations, insertions and deletions that prevent the production of a functional protein [2]. Parental genes of pseudogenes can be either currently active genes or genes that were only active in an ancient genome. In the latter case, pseudogenes are clearly “genomic relics” because they constitute the only remains of a once functional gene [3,4]. These “genomic relics” were once protein-coding genes that are no longer able to produce a functional protein. For example, some olfactory receptor (OR) genes in human were inactivated as the human olfactory ability became increasingly limited. While these

* Corresponding author. Tel.: þ49 30 9406 4227; fax: þ49 30 9406 4240. E-mail address: [email protected] (E.M. Muro). 0300-9084/$ e see front matter Ó 2011 Elsevier Masson SAS. All rights reserved. doi:10.1016/j.biochi.2011.07.024

human OR genes lost their protein-coding ability through pseudogenization, a large proportion of the orthologous OR genes remained functional in other mammals with superior olfactory capabilities [5,6]: about 400 protein-coding ORs remain in human, compared to 1000 in mouse [7]. Another example of a pseudogene whose parental gene is no longer extant involves the loss of the primate ability to synthesize vitamin C. L-gulonolactone oxidase (GULO), which is necessary for vitamin C synthesis, is present as a functional gene in most mammals, but it is a pseudogene in primates [8]. Other works have used pseudogenes for comparative studies, for example to study the loss of hemoglobin in different Antarctic icefishes [9]. More recently, we have used prokaryotic pseudogenes as markers of functionally less important genes to demonstrate for the first time that functionally less important genes tend to be located at the end of operons while the more important genes tend to be located toward operon starts [10]. Historically, pseudogenes were not considered functional because their transcripts were generally non-coding, which essentially equated to irrelevancy in a protein-centric world, where the old dogma simply viewed RNA as an intermediate molecule in the protein production process [11]. Taking into account the recent discoveries on the function of pseudogene transcripts that we will describe later, we will see that epithets such as “dead genes” or “junk DNA” are misnomers for pseudogenes. In this review, we will focus on studies that unearth novel functions of transcribed pseudogenes, which eventually lead to the discovery of their function as post-transcriptional regulators.

E.M. Muro et al. / Biochimie 93 (2011) 1916e1921

2. Pseudogene discovery The first pseudogene was reported in 1977, in a genomic region coding for the oocyte-type 5S RNA of Xenopus laevis [12]. The name, pseudogene, was given because: “this homologous structure was nearly as long as, and almost exact of, the gene itself”. The pseudogene had a truncated 50 end and 14 bp mismatches in comparison with its parental gene. In terms of its functionality, the possible role of the pseudogene as a “transcribed spacer” was then discussed, but the authors stressed that they thought pseudogenes were just “relics of evolution”, a term that since then has been frequently used. The next three years witnessed the sequencing of pseudogenes from globin genes of different species (rabbit, human, mouse, goat and sheep) [13e19]. The evidence showed that although these pseudogenes had comparable motifs to those previously observed in annotated genes, including a transcription initiation site, mRNA 50 capping motifs, start and stop codons, and canonical polyadenylation signals (PAS), frameshifts in the pseudogene introduced a number of premature stop codons, abrogating the translation of a full-length functional protein [19]. The fact that many globin genes possessed corresponding pseudogenes in five mammalian species, intriguingly implied some kind of functionality for these new genomic sequences. In ref. [19] it was proposed, without evidence, that if pseudogenes had any functionality this could be due to the use of the transcription machinery in the production of useless transcripts, a process that was described as ‘diverting genes’. As an alternative, the same work proposed, for the first time, the possible function of pseudogenes as antigenes (sources of antisense transcripts). At the time, the idea that pseudogenes had no functionality was entrenched in the minds of scientists: “a pseudogene is a DNA segment with high homology with a functional gene but containing nucleotide changes such as frameshift and nonsense mutations that prevent its expression” [20,21]. Reflecting this, the work reporting the first algorithm that calculated the rate of nucleotide substitution within pseudogenes bore the modest title: “pseudogenes as a paradigm of neutral evolution” [21]. At that time, it was observed that many pseudogenes had an intronless DNA sequence, so it was thought that their origin could be the result of reverse transcription [20]. It was later demonstrated (thanks to computational analyses of complete genome sequences of many organisms) that there are around five times more human processed pseudogenes than nonprocessed pseudogenes [22]. In 1985 a comprehensive review was published on the topic of pseudogene function by Elio F. Vanin [23]. He suggested that “pseudogene be used only to describe sequences found to be both related and defective”. Reflecting the predominance of processed pseudogenes over non-processed ones, the review is focused on the former, reporting 17 pseudogenes at the time. Their genetic defects were predominantly shown to consist of point mutations and indels (insertions and deletions) that lead to a change in the reading frame resulting in premature in-frame stop codons. Some of these pseudogenes had none of these changes, e.g., the processed pseudogene in rat RC9 cytochrome c. The existence of processed pseudogenes with intact coding sequences raised an important question. Could processed pseudogenes actually code for a functional protein? Due to the random genomic location of processed pseudogenes it was thought unlikely that a pseudogene be so lucky to find itself in a promoter region capable of initiating its transcription. 50 regulatory regions for pseudogenes such as the mouse rpL3204A [24] and the RC9 cytochrome c in rat [25] were noted, though no functional protein was ever found for those pseudogenes. Almost two decades passed before the first evidence of pseudogene translation to a functional protein was published in 2002: PGAM3, a protein coded from

1917

a (processed) pseudogene found in primate white blood cells (human, chimpanzee and macaque) [26]. Later, other examples of pseudogenes that are translated to truncated proteins were found, namely PsiCx43 [27] and NANOGP8 [28], which expand the regulatory possibilities of pseudogene transcripts. 3. The rise of pseudogene functionality The next most notable novelty in the field was the discovery that the immune system from chicken, human and rabbit diversifies its response using DNA sequence from pseudogenes through somatic gene conversion mechanisms, by which a DNA segment from the pseudogene is transferred to another immune system gene without modifying the pseudogene sequence [29]. This is the case of the immunoglobulin VH gene segments [30]. Pseudogenes were identified as repositories of genetic variability but not as having a biological function per se. It was not until 1999 that a pseudogene transcript was first reported to have an active biological function: the posttranscriptional regulation of neural nitric oxide synthase (nNOS) by an antisense transcript encoded by its own pseudogene (pseudoNOS) [31]. The pseudogene itself is a natural antisense transcript (NAT) that is 145 bp long and shares w80% complementarity with respect to the parental gene’s transcript. This enabled its association to the mRNA of NOS, which prevented its translation and therefore regulated nNOS protein synthesis. This has consequences in neural intercellular signaling. Experimental verification was performed both in vitro and in vivo in Lymnaea stagnalis, a freshwater snail. The working hypothesis of a pseudogene acting as an antigene had been suggested previously ([19] and [32]), but here it had been demonstrated for the first time. The second report describing biological activity of a pseudogene transcript appeared later in 2003 for the pseudogene Makorin1-p1 [33]. This work had a large influence triggering an important review [34], which we will describe below. The study by Hirotsune and coworkers [33] concluded that a transcribed pseudogene from Makorin1 (Makorin1-p1) was regulating Makorin1 in mouse, even though some fragmented open reading frames impeded the protein translation of the pseudogene. The authors defined pseudogenes to be “a gene copy that does not produce a functional full length protein” [33]. It was observed that a transgene-insertion mutant mouse showed bone deformity and polycystic kidneys, and it was claimed that the insertion reduced the transcription of Makorin1-p1, which was imprinted and affected the mRNA regulation of Makorin1. In 2006 these results were thoroughly refuted as it was shown that Makorin1-p1 is neither imprinted nor expressed [35]: both Makorin1-p1 alleles are methylated and therefore it is a silent pseudogene, reestablishing the idea that mammalian pseudogenes are only “evolutionary relics”. Nonetheless, in 2003, between the Makorin1-p1 work and its refutation, a very relevant review by Balakirev and Ayala was written proposing that pseudogenes can be potogenes, DNA sequences that have the potential to evolve to a new gene [34]. This idea had actually been previously proposed shortly after the discovery of the existence of pseudogenes [20]. This review [34] focused on the evolution of pseudogenes. Drosophila melanogaster was used as an example for the study of pseudogene evolution, mainly due to the extensive experience of the authors in that organism, although pseudogenes are not as frequent in Drosophila as in mammals. This added a different and interesting perspective on the topic. For instance, Drosophila pseudogenes were noted for having more synonymous mutations than deleterious mutations, as well as some conserved functional regions, suggesting that they could actually code for proteins.

1918

E.M. Muro et al. / Biochimie 93 (2011) 1916e1921

But it was also noted that some human pseudogenes evolve to gain function without coding for a protein. The analysis of the upstream regions of the cytokeratin 17 gene and its pseudogene provides an example: it was demonstrated that cis elements 50 of this pseudogene can interact with enhancer elements of the parental gene, thus inducing reporter gene activity in HeLa cells [36]. Another example was later demonstrated in the studies on the Xist RNA, a gene that at least in part evolved from a gene that had lost its protein-coding ability. Xist is involved in the silencing of one of the female X chromosomes in order to equalize the gene dosage between males (XY) and females (XX) [36]. Based on their contemporary assessment, Balakirev and Ayala proposed the most avant-garde pseudogene definition to date: “Pseudogenes have been defined as nonfunctional sequences of genomic DNA originally derived from functional genes”; “they should be defined as DNA sequences derived by duplication or retrotransposition from functional genes that are often subject to natural selection and therefore retain much of the original sequence and structure because they have acquired new regulatory or other functions, or may serve as reservoirs of genetic variability.” 4. Pseudogene involvement in siRNA production in mouse oocytes In 2008, two letters published back-to-back in Nature reported for the first time that endogenous small interfering RNAs (siRNAs) can arise from pseudogenes in mouse oocytes [37,38]. One of these works [38] reported siRNAs derived from expressed pseudogenes. As an example they demonstrated that the knockout of Dicer, which encodes an enzyme necessary for the processing of miRNAs and siRNAs, resulted in decreased levels of siRNAs derived from pseudogene Au76, and correspondingly there was a 4-fold increase of expression in its parental gene Rangap1. In the second work [37], evidence was presented for siRNAs being formed by hybridization of pseudogene transcripts to their complementarity coding mRNA that are subsequently processed by Dicer. These works collectively suggested that pseudogenes can produce siRNAs that regulate their parental genes post-transcriptionally. An open question that remains is whether these siRNAs could also regulate the pseudogene RNA [39]. Most of the pseudogenes that produce siRNAs in mouse have not been found in rat [38]; therefore, it remains to be seen if this mechanism exists in other species. 5. Pseudogenes can regulate their parental gene by decoying parental gene post-transcriptional regulators such as miRNAs A model has been proposed by Poliseno and coworkers, where pseudogene transcripts can act as a target for miRNAs that actually regulate the parental gene (Fig. 1) [1]. For instance, in a given tissue there would be a balance in the level of expression from both parental gene and pseudogene; if the pseudogene transcription decreases, more miRNAs are able to target the parental gene. On the other hand, an increase in the pseudogene transcription implies that less miRNA will target the parental gene. Therefore, the pseudogenes indirectly regulate the corresponding parental gene by competing for binding to the miRNA. In particular, increased pseudogene transcription was demonstrated for the human pseudogene PTENP1 and its parental gene, PTEN, a tumor suppressor gene. A region of the PTENP1 transcript has high homology to the 30 UTR of the PTEN mRNA, thereby allowing some miRNA families that target PTEN (like miR-17, mirR-21, miR-214, miR-19 and miR46) to bind the pseudogene transcript. In DU145 prostate cancer cells it was shown that miR-19b and miR-20a are able to repress both the PTEN and PTENP1 levels of RNA, while some other miRNAs derepress both PTEN and PTENP1 RNA levels.

Fig. 1. A generic model for the function of pseudogene transcripts as decoys of miRNAs. (a) A gene X expresses an RNA X that can be translated into a protein. RNA X has a binding site for miRNA Z, which potentially inhibits the production of the protein X by binding to RNA X. Pseudogene X0 expresses an RNA X0 with similarity to RNA X, in particular including the same binding site for miRNA Z. Therefore, RNA X0 competes for miRNA Z with RNA X, effectively acting as a decoy for this miRNA. (b) If RNA X0 is lowly expressed, miRNA Z can bind to RNA X and protein X is not be produced. (c) A high level of expression of RNA X0 decoys miRNA Z away from RNA X, which can then produce the protein.

The overexpression of the PTENP1 30 UTR results in a derepression of both the PTEN mRNA and its protein levels; therefore, a change of the pseudogene RNA level has post-transcriptional and translational consequences. Accordingly, it was demonstrated that in sporadic colon cancer tissue the level of expression arising from the PTENP1 locus is lower [1]. Additional evidence was provided by the findings that PTEN expression is frequently downregulated in cancer [40] and that small changes in the dose of PTEN have critical consequences on epithelial cancer [41].

E.M. Muro et al. / Biochimie 93 (2011) 1916e1921

Similar results were found on the role of KRAS1P, a pseudogene of the human KRAS oncogene [1]. Even though these works expose a role for pseudogenes in cancer, additional research is needed to explore the pseudogene’s general role in disease pathogenesis. Other studies have described the binding of families of miRNAs to both parental gene and pseudogene (connexin 43, CDK4PS, FOXQ3B, E2F3P1 and OCT4). In the case of OCT4, a very complex scenario is proposed where the multiple pseudogenes of OCT4 could be regulating its expression. Actually, it has been demonstrated that OCT4-pg1 and OCT4-pg5 are expressed only in cancer tissues and not in normal tissues [42]. In addition, an ncRNA has been recently found that binds in antisense to the OCT4-pg5 transcript increasing OCT4-pg4 and OCT4-pg5 transcription and regulating OCT4 expression [43]. Pseudogene transcripts can also regulate their parental gene by decoying non-miRNA based stability factors as discussed in [44]. In particular, it was demonstrated that the transcripts from both the HMGA1 gene and of its pseudogene HMGA1-p compete for the alpha-CP1 protein in mouse and human cells [45]. That protein is involved in mRNA stabilization; therefore, increase in the expression of the pseudogene may result in degradation of the HMGA1 mRNA. This is relevant to human disease because low levels of HMGA1 protein can result in insulin resistance and type 2 diabetes [46]. 6. Conclusion The term pseudogene originated from the resemblance of their sequences to that of the parental gene. Since their discovery they have been mainly considered as evolutionary relics although this point of view may be slowly changing as we have illustrated in this review. Comparative sequence analysis can be used to study the speed at which pseudogenes evolve. For example, the processed pseudogene Makorin1-p1 exists in mouse but it is not conserved in rat because the pseudogene had originated after their phylogenetic divergence [47]. A comparison of regions covered by the ENCODE project of the human genome to some other species (28 species, mostly mammalian) indicates that the speed of evolution of pseudogenes is faster than that of other genes [48]. Moreover, in this work it was shown that human processed pseudogenes are much less preserved than non-processed pseudogenes, indicating that processed pseudogenes have a recent evolutionary origin. For instance w80% of the human processed pseudogenes from the ENCODE regions are primate-specific. Another study of pseudogene evolution demonstrated that w50% of the RNAs from human pseudogenes are conserved in rhesus monkey but only w3% in mouse [49]. Unless pseudogenes gain function, they evolve quickly, accumulating mutations that erase any translational capability. Although most of the existing evidence of pseudogene function points to their production of ncRNA transcripts, some cases of protein-coding pseudogenes are known (e.g., PGAM3 in primates [26], which is believed to be an evolved processed pseudogene), and a study indicates 68 human transcribed pseudogenes whose conservation across species suggests that they could be proteincoding [49]. At the transcript level two novel roles have been identified for pseudogenes, both of them operating to regulate the parental gene: 1) encoding siRNAs [38] and 2) decoying miRNAs that target the parental gene [1]. It is suggested that the latter role is more prevalent because there are few known natural antisense transcripts within pseudogene loci compared to the number of sense transcripts that could act as potential miRNA decoy targets [1]. However, there is still no evidence supporting this statement, since the levels of transcription of siRNA within pseudogenes have not been yet quantified for a large range of tissues and conditions.

1919

Detection of transcripts from pseudogenes is a complex issue. Their expression may remain undetected partly due to the use of technologies for transcript level measurement that were not designed with pseudogene expression in mind. The fact that pseudogenes have high similarity to their corresponding parental genes complicates the measurements. Due to this fact, gene prediction algorithms sometimes cannot distinguish between coding genes and pseudogenes [50,51]. For example, microarray probe design was done based in the uniqueness of the probe, in order to avoid cross hybridization, complicating the detection of pseudogene transcription. In the course of an RNA-seq analysis it is a common practice to eliminate the reads that align to multiple genomic locations, which are likely to contain pseudogenes. Another factor complicating the detection of transcripts from pseudogenes is that their level of transcription seems to be much lower to that of their corresponding parental genes [52]. However, it can be sometimes comparable [53], or even higher [1]. Therefore, we expect that some pseudogenes currently without evidence of transcription might eventually be found to be transcribed in some tissues under particular circumstances. As an alternative to experimental measurements of pseudogene transcription there are computational approaches to detect transcripts arising from pseudogenes. Some of them reuse existing public data. For example, EST/cDNA libraries have been used to provide evidence of transcription arising from human and mouse pseudogenes in a genome-wide computational screen [54]. In order to provide evidence of their functionality, those that had a high degree of conservation were selected. The most important candidates were two pseudogenes from Spinocerebellar ataxia type 1 (ATX1) and Ataxin 7like 3, but to date, no experimental evidence has been provided of their function. The main problem of this methodology is that there is not that much transcription detected from orthologous pseudogenes. Another computational study that uses the EST/cDNA libraries is our recent work where antisense transcription arising from human pseudogenes has been detected [55]. Different analyses on transcriptional activity from human pseudogenes showed that 2e3% [56] to 4e6% [52] are transcribed. A more recent analysis of transcription from annotated pseudogenes from the ENCODE project demonstrated that at least 20% of the pseudogenes are transcribed [57]. In parallel to the accumulation of evidence on functional pseudogene transcripts there is a natural emergence of reports about some of those being associated to disease. Beyond the abovementioned pseudogenes related to tumor genes PTEN and KRAS [36], we can point to a study that shows that 90% of cases of congenital adrenal hyperplasia (CAH) are caused by mutations in the CYP21 gene. In 75% of these cases the cause is a chimeric gene, of which the 50 end originates from the CYP21 pseudogene and the 30 end from CYP21 [58]. Another chimera of this kind (delta-globin) has been found in lemur (a prosimian) [59]. It is very possible that these chimeras are produced through gene conversions. In summary, the accumulating research on pseudogene functionality is bringing forth novel questions for research. For example, the discovery that pseudogene transcripts can regulate the parental gene by interacting with ncRNAs raises crucial questions as to how pseudogenes are alternatively spliced [60] or how alternative pseudogene transcript ends are selected [61], both of which affect parental gene regulation. These mechanisms likely depend on developmental stage, tissue and some other particular conditions, including disease states. It must be noted that even if the main arguments in favor of pseudogene-based post-transcriptional regulation are very recent, the concept had already been introduced in 1980, at a time when there was no transcript evidence arising from pseudogenes and most of the knowledge was based on draft nucleotide sequences for

1920

E.M. Muro et al. / Biochimie 93 (2011) 1916e1921

a small number of pseudogenes: “pseudogene anti-transcripts form ternary complexes with the transcripts of the corresponding productive genes and the small stable nuclear RNAs recently implicated in RNA processing” [62,63]. Three decades after, strong evidence supports that transcription arising from pseudogenes results in the post-transcriptional regulation of parental genes. There are hints that pseudogenes do play a role in disease. It is expected that elucidating the responsible mechanisms will be one of the main motivations for the next works in the field. Acknowledgments We thank the reviewers for extensive comments that helped us to improve this manuscript. References [1] L. Poliseno, L. Salmena, J. Zhang, B. Carver, W.J. Haveman, P.P. Pandolfi, A coding-independent function of gene and pseudogene mRNAs regulates tumour biology, Nature 465 (2010) 1033e1038. [2] A.J. Mighell, N.R. Smith, P.A. Robinson, A.F. Markham, Vertebrate pseudogenes, FEBS Lett. 468 (2000) 109e114. [3] J. Zhu, J.Z. Sanborn, M. Diekhans, C.B. Lowe, T.H. Pringle, D. Haussler, Comparative genomics search for losses of long-established genes on the human lineage, PLoS Comput. Biol. 3 (2007) e247. [4] X. Wang, W.E. Grus, J. Zhang, Gene losses during human origins, PLoS Biol. 4 (2006) e52. [5] Y. Niimura, M. Nei, Evolution of olfactory receptor genes in the human genome, Proc. Natl. Acad. Sci. USA 100 (2003) 12235e12240. [6] M. Nei, Y. Niimura, M. Nozawa, The evolution of animal chemosensory receptor gene repertoires: roles of chance and necessity, Nat. Rev. Genet. 9 (2008) 951e963. [7] Y. Niimura, M. Nei, Evolutionary dynamics of olfactory and other chemosensory receptor genes in vertebrates, J. Hum. Genet. 51 (2006) 505e517. [8] L. Hasan, P. Vogeli, S. Neuenschwander, P. Stoll, E. Meijerink, C. Stricker, H. Jorg, G. Stranzinger, The L-gulono-gamma-lactone oxidase gene (GULO) which is a candidate for vitamin C deficiency in pigs maps to chromosome 14, Anim. Genet. 30 (1999) 309e312. [9] T.J. Near, S.K. Parker, H.W. Detrich 3rd, A genomic fossil reveals key steps in hemoglobin loss by the antarctic icefishes, Mol. Biol. Evol. 23 (2006) 2008e2016. [10] E.M. Muro, N. Mah, G. Moreno-Hagelsieb, M.A. Andrade-Navarro, The pseudogenes of Mycobacterium leprae reveal the functional relevance of gene order within operons, Nucleic Acids Res. (2010). [11] D. Ulveling, C. Francastel, F. Hube, When one is better than two: RNA with dual functions, Biochimie 93 (2011) 633e644. [12] C. Jacq, J.R. Miller, G.G. Brownlee, A pseudogene structure in 5S DNA of Xenopus laevis, Cell 12 (1977) 109e120. [13] F.R. Blattner, A.E. Blechl, K. Denniston-Thompson, H.E. Faber, J.E. Richards, J.L. Slightom, P.W. Tucker, O. Smithies, Cloning human fetal gamma globin and mouse alpha-type globin DNA: preparation and screening of shotgun collections, Science 202 (1978) 1279e1284. [14] O. Smithies, A.E. Blechl, K. Denniston-Thompson, N. Newell, J.E. Richards, J.L. Slightom, P.W. Tucker, F.R. Blattner, Cloning human fetal gamma globin and mouse alpha-type globin DNA: characterization and partial sequencing, Science 202 (1978) 1284e1289. [15] E. Lacy, R.C. Hardison, D. Quon, T. Maniatis, The linkage arrangement of four rabbit beta-like globin genes, Cell 18 (1979) 1273e1283. [16] R.C. Hardison, E.T. Butler 3rd, E. Lacy, T. Maniatis, N. Rosenthal, A. Efstratiadis, The structure and transcription of four linked rabbit beta-like globin genes, Cell 18 (1979) 1285e1297. [17] E.F. Fritsch, R.M. Lawn, T. Maniatis, Molecular cloning and characterization of the human beta-like globin gene cluster, Cell 19 (1980) 959e972. [18] J. Lauer, C.K. Shen, T. Maniatis, The chromosomal arrangement of human alpha-like globin genes: sequence homology and alpha-globin gene deletions, Cell 20 (1980) 119e130. [19] E.F. Vanin, G.I. Goldberg, P.W. Tucker, O. Smithies, A mouse alpha-globinrelated pseudogene lacking intervening sequences, Nature 286 (1980) 222e226. [20] N. Proudfoot, Pseudogenes, Nature 286 (1980) 840e841. [21] W.H. Li, T. Gojobori, M. Nei, Pseudogenes as a paradigm of neutral evolution, Nature 292 (1981) 237e239. [22] J.E. Karro, Y. Yan, D. Zheng, Z. Zhang, N. Carriero, P. Cayting, P. Harrrison, M. Gerstein, Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation, Nucleic Acids Res. 35 (2007) D55eD60. [23] E.F. Vanin, Processed pseudogenes: characteristics and evolution, Annu. Rev. Genet. 19 (1985) 253e272. [24] K.P. Dudov, R.P. Perry, The gene family encoding the mouse ribosomal protein L32 contains a uniquely expressed intron-containing gene and an unmutated processed gene, Cell 37 (1984) 457e468.

[25] R.C. Scarpulla, Processed pseudogenes for rat cytochrome c are preferentially derived from one of three alternate mRNAs, Mol. Cell Biol. 4 (1984) 2279e2288. [26] E. Betran, W. Wang, L. Jin, M. Long, Evolution of the phosphoglycerate mutase processed gene in human and chimpanzee revealing the origin of a new primate gene, Mol. Biol. Evol. 19 (2002) 654e663. [27] M. Kandouz, A. Bier, G.D. Carystinos, M.A. Alaoui-Jamali, G. Batist, Connexin43 pseudogene is expressed in tumor cells and inhibits growth, Oncogene 23 (2004) 4763e4770. [28] J. Zhang, X. Wang, M. Li, J. Han, B. Chen, B. Wang, J. Dai, NANOGP8 is a retrogene expressed in cancers, FEBS J. 273 (2006) 1723e1730. [29] J.C. Weill, C.A. Reynaud, The chicken B cell compartment, Science 238 (1987) 1094e1098. [30] E. Vargas-Madrazo, J.C. Almagro, F. Lara-Ochoa, Structural repertoire in VH pseudogenes of immunoglobulins: comparison with human germline genes and human amino acid sequences, J. Mol. Biol. 246 (1995) 74e81. [31] S.A. Korneev, J.H. Park, M. O’Shea, Neuronal expression of neural nitric oxide synthase (nNOS) protein is suppressed by an antisense RNA transcribed from an NOS pseudogene, J. Neurosci. 19 (1999) 7711e7720. [32] J.R. McCarrey, A.D. Riggs, Determinator-inhibitor pairs as a mechanism for threshold setting in development: a possible function for pseudogenes, Proc. Natl. Acad. Sci. USA 83 (1986) 679e683. [33] S. Hirotsune, N. Yoshida, A. Chen, L. Garrett, F. Sugiyama, S. Takahashi, K. Yagami, A. Wynshaw-Boris, A. Yoshiki, An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene, Nature 423 (2003) 91e96. [34] E.S. Balakirev, F.J. Ayala, Pseudogenes: are they "junk" or functional DNA? Annu. Rev. Genet. 37 (2003) 123e151. [35] T.A. Gray, A. Wilson, P.J. Fortin, R.D. Nicholls, The putatively functional Mkrn1p1 pseudogene is neither expressed nor imprinted, nor does it regulate its source gene in trans, Proc. Natl. Acad. Sci. USA 103 (2006) 12039e12044. [36] S.M. Troyanovsky, R.E. Leube, Activation of the silent human cytokeratin 17 pseudogene-promoter region by cryptic enhancer elements of the cytokeratin 17 gene, Eur. J. Biochem. 225 (1994) 61e69. [37] O.H. Tam, A.A. Aravin, P. Stein, A. Girard, E.P. Murchison, S. Cheloufi, E. Hodges, M. Anger, R. Sachidanandam, R.M. Schultz, G.J. Hannon, Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes, Nature 453 (2008) 534e538. [38] T. Watanabe, Y. Totoki, A. Toyoda, M. Kaneda, S. Kuramochi-Miyagawa, Y. Obata, H. Chiba, Y. Kohara, T. Kono, T. Nakano, M.A. Surani, Y. Sakaki, H. Sasaki, Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes, Nature 453 (2008) 539e543. [39] X. Guo, Z. Zhang, M.B. Gerstein, D. Zheng, Small RNAs originated from pseudogenes: cis- or trans-acting? PLoS Comput. Biol. 5 (2009) e1000449. [40] A. Carracedo, A. Alimonti, P.P. Pandolfi, PTEN level in tumor suppression: how much is too little? Cancer Res. 71 (2011) 629e633. [41] A. Alimonti, A. Carracedo, J.G. Clohessy, L.C. Trotman, C. Nardella, A. Egia, L. Salmena, K. Sampieri, W.J. Haveman, E. Brogi, A.L. Richardson, J. Zhang, P.P. Pandolfi, Subtle variations in Pten dose determine cancer susceptibility, Nat. Genet. 42 (2010) 454e458. [42] G. Suo, J. Han, X. Wang, J. Zhang, Y. Zhao, J. Dai, Oct4 pseudogenes are transcribed in cancers, Biochem. Biophys. Res. Commun. 337 (2005) 1047e1051. [43] P.G. Hawkins, K.V. Morris, Transcriptional regulation of Oct4 by a long noncoding RNA antisense to Oct4-pseudogene 5, Transcription 1 (2010) 165e175. [44] R.C. Pink, K. Wicks, D.P. Caley, E.K. Punch, L. Jacobs, D.R. Carter, Pseudogenes: pseudo-functional or key regulators in health and disease? RNA 17 (2011) 792e798. [45] E. Chiefari, S. Iiritano, F. Paonessa, I. Le Pera, B. Arcidiacono, M. Filocamo, D. Foti, S.A. Liebhaber, A. Brunetti, Pseudogene-mediated posttranscriptional silencing of HMGA1 can result in insulin resistance and type 2 diabetes, Nat. Commun. 1 (2010) 40. [46] A. Brunetti, G. Manfioletti, E. Chiefari, I.D. Goldfine, D. Foti, Transcriptional regulation of human insulin receptor gene by the high-mobility group protein HMGI(Y), FASEB J. 15 (2001) 492e500. [47] O. Podlaha, J. Zhang, Nonneutral evolution of the transcribed pseudogene Makorin1-p1 in mice, Mol. Biol. Evol. 21 (2004) 2202e2209. [48] D. Zheng, A. Frankish, R. Baertsch, P. Kapranov, A. Reymond, S.W. Choo, Y. Lu, F. Denoeud, S.E. Antonarakis, M. Snyder, Y. Ruan, C.L. Wei, T.R. Gingeras, R. Guigo, J. Harrow, M.B. Gerstein, Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution, Genome Res. 17 (2007) 839e851. [49] A.N. Khachane, P.M. Harrison, Assessing the genomic evidence for conserved transcribed pseudogenes under selection, BMC Genomics 10 (2009) 435. [50] M.J. van Baren, M.R. Brent, Iterative gene prediction and pseudogene removal improves genome annotation, Genome Res. 16 (2006) 678e685. [51] M.R. Brent, Steady progress and recent breakthroughs in the accuracy of automated genome annotation, Nat. Rev. Genet. 9 (2008) 62e73. [52] P.M. Harrison, D. Zheng, Z. Zhang, N. Carriero, M. Gerstein, Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability, Nucleic Acids Res. 33 (2005) 2374e2383. [53] J. Sorge, E. Gross, C. West, E. Beutler, High level transcription of the glucocerebrosidase pseudogene in normal subjects and patients with Gaucher disease, J. Clin. Invest. 86 (1990) 1137e1141. [54] O. Svensson, L. Arvestad, J. Lagergren, Genome-wide survey for biologically functional pseudogenes, PLoS Comput. Biol. 2 (2006) e46.

E.M. Muro et al. / Biochimie 93 (2011) 1916e1921 [55] E.M. Muro, M.A. Andrade-Navarro, Pseudogenes as an alternative source of natural antisense transcripts, BMC Evol. Biol. 10 (2010) 338. [56] Y. Yano, R. Saito, N. Yoshida, A. Yoshiki, A. Wynshaw-Boris, M. Tomita, S. Hirotsune, A new role for expressed pseudogenes as ncRNA: regulation of mRNA stability of its homologous coding gene, J. Mol. Med. 82 (2004) 414e422. [57] D. Zheng, M.B. Gerstein, The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they? Trends Genet. 23 (2007) 219e224. [58] H.H. Lee, D.M. Niu, R.W. Lin, P. Chan, C.Y. Lin, Structural analysis of the chimeric CYP21P/CYP21 gene in steroid 21-hydroxylase deficiency, J. Hum. Genet. 47 (2002) 517e522.

1921

[59] A.J. Jeffreys, P.A. Barrie, S. Harris, D.H. Fawcett, Z.J. Nugent, A.C. Boyd, Isolation and sequence analysis of a hybrid delta-globin pseudogene from the brown lemur, J. Mol. Biol. 156 (1982) 487e503. [60] T.W. Nilsen, B.R. Graveley, Expansion of the eukaryotic proteome by alternative splicing, Nature 463 (2010) 457e463. [61] E.M. Muro, R. Herrington, S. Janmohamed, C. Frelin, M.A. Andrade-Navarro, N.N. Iscove, Identification of gene 30 ends by automated EST cluster analysis, Proc. Natl. Acad. Sci. USA 105 (2008) 20286e20290. [62] M.R. Lerner, J.A. Boyle, S.M. Mount, S.L. Wolin, J.A. Steitz, Are snRNPs involved in splicing? Nature 283 (1980) 220e224. [63] J. Rogers, R. Wall, A mechanism for RNA splicing, Proc. Natl. Acad. Sci. USA 77 (1980) 1877e1879.