Gene 421 (2008) 1–6
Contents lists available at ScienceDirect
Gene j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / g e n e
Genome duplication and gene-family evolution: The case of three OXPHOS gene families Anna De Grassi a, Cecilia Lanave a, Cecilia Saccone a,b,⁎ a b
Istituto di Tecnologie Biomediche, Sede di Bari, CNR, Bari, Italy Dipartimento di Biochimica e Biologia Molecolare, Università di Bari, Bari, Italy
A R T I C L E
I N F O
Article history: Received 19 July 2007 Received in revised form 15 May 2008 Accepted 21 May 2008 Available online 23 June 2008 Keywords: Genome duplication Oxidative phosphorylation
A B S T R A C T DNA duplication is one of the main forces acting on the evolution of organisms because it creates the raw genetic material that natural selection can subsequently modify. Duplicated regions are mainly due to “errors” in different phases of meiosis, but DNA transposable elements and reverse transcription also contribute to amplify and move the genomic material to different genomic locations. As a result, redundancy affects genomes to variable degrees: from the single gene to the whole genome (WGD). Gene families are clusters of genes created by duplication and their size reflects the number of duplicated genes, called paralogs, in each species. The aim of this review is to describe the state of the art in the identification and analysis of gene families in eukaryotes, with specific attention to those generated by ancient large scale events in vertebrates (WGD or large segmental duplications). As a case study, we report our work on the evolution of gene families encoding subunits of the five OXPHOS (oxidative phosphorylation) complexes, fundamental and highly conserved in all respiring cells. Although OXPHOS gene families are smaller than the general trend in nuclear gene families, some exceptions are observed, such as three gene families with at least two paralogs in vertebrates. These gene families encode cytochrome c (Cyt c, the electron shuttle protein between complex III and IV), Lipid Binding Protein (LBP, the channel protein of complex V which transfers protons through the inner mitochondrial membrane) and the MLRQ subunit (MLRQ, a supernumerary subunit of the large complex I, with unknown function). We provide a two-step approach, based on structural genomic data, to demonstrate that these gene families should have arisen through WGD (or large segmental duplication) events at the origin of vertebrates and, only afterwards, underwent species-specific events of further gene duplications and loss. In summary, this review reflects the need to apply genome comparative approaches, deriving from both “classical” molecular phylogenetic analysis and “new” genome map analysis, to successfully define the complex evolutionary relations between gene family members which, in turn, are essential to obtain any other comparative phylogenetic or functional results. © 2008 Elsevier B.V. All rights reserved.
1. Genome redundancy and gene families Genome complexity is historically associated to studies of DNA reassociation kinetics, which can show the relative amount of different genomic fractions, according to their degree of repetitiveness (Marx et al., 1976). The greater the total length of single-copy DNA increases in a genome, the more complex and potentially informative the genome is. On the contrary, identical/similar DNA sequences increase genome redundancy and reduce its complexity. Redundancy is a distinctive trait of eukaryotic genomes and it is generally assumed to
Abbreviations: WGD, whole genome duplication; My (Mya), million years (ago); OXPHOS, oxidative phosphorylation; LBP, lipid binding protein; Cyt c, cytochrome c. ⁎ Corresponding author. Istituto di Tecnologie Biomediche, Sede di Bari, CNR, Bari, Italy. Tel.: +39 080 5929661; fax: +39 080 5929690. E-mail address:
[email protected] (C. Saccone). 0378-1119/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2008.05.011
be generated by DNA duplication. In 1970 Susumu Ohno (Ohno, 1970) had the brilliant intuition that duplication is the main force acting on the evolution of organisms, because it creates the raw genetic material that natural selection can subsequently modify. This intuition has been widely confirmed by the results deriving from actual genome sequencing projects. For example, complete genomic sequences from diverse phylogenetic lineages show that redundancy is more tolerated by organisms with a small population size (Lynch and Conery, 2003). According to this theory, the large population size of prokaryotes imposes a barrier to their evolution by duplication, while as for multicellular eukaryotes, the long-term reduction of population size, correlated with the increase in their organism size, amplifies the effect of genetic drift and permits these organisms to tolerate the passive emergence of genomic alteration such as duplications. Redundancy affects both coding and non coding DNA sequences. A gene family is the result of gene duplication and its size reflects the
2
A. De Grassi et al. / Gene 421 (2008) 1–6
number of duplicated genes, called paralogs, in each species. Changes in the gene family size are highly widespread among organisms, also over relatively short evolutionary distances, and generate different relations between gene family members of two species: from one-toone, to one-to-many or many-to-many. While paralogs are created by a horizontal event (duplication), orthologs reflect a vertical event (speciation), according to the definition of Fitch (Fitch, 1970). The variable gene family size and the continued gain and loss of genes during genome evolution often mask the real one-to-one relationship between homologous genes of different organisms, thus making the identification of orthologs very difficult, sometimes impossible. For this reason, it is better to speak of “putative orthologs”. These are generally singled out using one of three methods: (1) reciprocal best sequence similarity scores between paralogs from different organisms (Altschul et al., 1990; Tatusov et al., 2001; O'Brien et al., 2005), (2) the phylogenetic approach based on genetic distance and tree topology (Fulton et al., 2006; Li et al., 2006) or (3) conserved gene location along species specific chromosomes (synteny) (Zheng et al., 2005; Nozawa and Nei, 2007). In addition, it should be considered that expansions and contractions of gene families are rarely related to underlying biological differences. There are, however, some examples. Humans, apes and old world monkeys are trichromatic, i.e. can distinguish light at three different wavelengths, while rodents are dichromatic: this is due to a primate-specific duplication of an opsin gene, absent in the mouse genome (Bowmaker, 1998). Moreover, it is well known that mice retain a higher number of paralogs encoding olfactory receptors than humans. This should be linked to their increasing sensory reliance on smell (Niimura and Nei, 2005). However the relationship between olfactory receptor genes and the ability of olfaction is not straightforward. In this regard, Nei (2007) proposed the concept of genomic drift, in which the gene copy number changes due to random duplication and deletion of genes within the range of requirement for each species. Thus, in most cases, the entity, modality and biological effects of changes in gene family size are widely unknown. 2. The functional fate of paralogs Duplications are mainly due to “errors” in different phases of meiosis (from DNA replication to crossing over and chromosome segregation) or can be mediated by specific transposable elements,
which amplify and move the genomic material to a different genomic location. Finally they can also be generated by reverse-transcription: an mRNA intermediate is retro-transcribed in a DNA sequence and reinserted in random genomic locations. In general, unlike “classical” duplicated genes, retro-genes immediately lose introns, promoters and their original genomic context. However as shown by several authors (e.g., Ejima and Yang, 2003; Nozawa et al., 2005) retrogenes may contain introns and promoters as well as exons. Duplications affect genomic regions of variable size: from a single gene, to a cluster of genes (see Fig. 1), to larger genomic segments or to the whole genome. Gene duplications arise in eukaryotes at a rate of 0.01 paralogs per gene per million years (Lynch and Conery, 2000), the same order of magnitude as the mutation rate per nucleotide per year (Li, 1999). It seems that paralogs undergo a short period of shared “relaxed” selection during their early evolutionary life, evolving in a neutral way, and then they have a different fate (see Fig. 1). Most paralogs are lost in a few million years. The gene sequence loses its coding potential through degenerative mutations (“non-functionalization”) and it can be maintained in the genome as the relict of the functional sequence (pseudogene) or be definitively lost (Lynch and Connery, 2000; Maere et al., 2005). Only a few paralogs are preserved and undergo purifying selection. According to the “neo-functionalization” model, the ancestral gene keeps its ancestral function, while the duplicated one gains a new function, preserved by natural selection. In “subfunctionalization” both the paralogs lose a different subset of functions of the ancestral copy and complement the original function (Force et al., 1999). This model, known as “duplication–degeneration– complementation” (DDC) assumes that paralogs undergo complementary degeneration of cis-regulatory motifs and reproduce the whole cis-regulative complement of the ancestral gene. Contrary to this prediction, a recent study has demonstrated that the number of cis-regulatory motifs is constant for all paralogs in yeast and does not decrease with increasing gene family size (Papp et al., 2003b). The same authors suggest that it is the reduction in shared regulatory motifs and not their absolute number in the different paralogs to make the expression difference. A further model called “sub-neofunctionalization” states that “subfunctionalization” rapidly occurs after the duplication event, but is often accompanied by “neo-functionalization”, pointing out the important role of positive selection in maintaining paralogs (Shiu et al., 2006).
Fig. 1. Gene duplication and modification. Example of a double duplication event which affects a genomic region with three genes (1, 2, 3) and create four paralogons (see Section 2). The following events which could modify the function of paralogs are indicated.
A. De Grassi et al. / Gene 421 (2008) 1–6
The expression level and the stage/tissue specificity of paralogs is a sign of their functional divergence. Studies on yeasts demonstrate that sequence divergence is positively correlated to divergence in the expression levels of paralogs (Zhang et al., 2004; Gu et al., 2002; Gu et al., 2004) and to the reduction in shared cis-regulatory motifs (Papp et al., 2003b). Similar results have been obtained in humans, for which the expression divergence of paralogs has been measured between 24 different tissues (Makova and Li, 2003). Finally, a general model has emerged in mammals, for which the greater the number of paralogs (gene family size), the more the expression level of paralogs decreases while tissue specificity increases (Huminiecki and Wolfe, 2004). 3. Whole genome duplications and paralogs Polyploidization or whole genome duplication (WGD) is the widest duplication event because the whole gene complement is quantitatively but not qualitatively increased in the immediate phase after duplication and only afterwards paralogs are lost or subjected to sequence divergence. Paralogs generated in this way are also called ohnologs (Wolfe, 2000), due to the first formulation of this hypothesis by Ohno (1970). There is strong evidence of WGD events in the evolutionary history of all the main eukaryotic lineages, such as the protozoan Paramecium tetraurelia (Aury et al., 2006), the yeast Saccharomyces Cerevisiae (Wolfe and Shields, 1997; Kellis et al., 2004), the plant Arabidopsis thaliana (Arabidopsis Genome Initiative, 2000; Blanc and Wolfe, 2004), bony fishes (Amores et al., 1998; Vandepoele et al., 2004; Jaillon et al., 2004) and amphibians (Flajnik and Kashsra, 2001). There is not any evidence of WGD in insects or nematodes. The detection of a WGD event is neither a simple nor a standardized procedure because, as pointed out above, the duplicated genome structure is degenerated by further duplication, loss and rearrangement events. Its reconstruction is actually carried out by means of two different approaches: temporal and spatial. The temporal analysis is based on the principle that ohnologs have a simultaneous origin and so they can be identified by a large peak in the distribution of paralogs as a function of the divergence time (Van de Peer, 2004). In contrast, the spatial approach consists of identifying “paralogons”: genomic segments, in different chromosome locations, which harbor a similar set of paralogs in a similar order (see Fig. 1). The evidence of WGD, in this case, is that paralogons are uniformly distributed along the chromosomes and that they cover most of the genome (Panopoulou and Pouskta, 2005). Further evidence of WGD is a two-to-one relationship between paralogons of the duplicated genome and those of the genome of an evolutionary close organism at a time preceding WGD (Vandepoele et al., 2002; Kellis et al., 2004; Jaillon et al., 2004). The Paramecium genome underwent three different rounds of WGD (Aury et al., 2006). The detection of old and recent paralogs for complex subunits, related to different WGD rounds, demonstrates that ancient paralogs are under-duplicated, while recent paralogs are overduplicated. These results could suggest that an organism initially tends to maintain most of these paralogs probably to preserve a balance in their relative dosage, while the majority of paralogs are gradually lost over a longer time period. A specific case of WGD is postulated by the 1R/2R hypothesis, according to which there was a double WGD in the early evolutionary history of vertebrates, about 500–340 Mya (Garcia-Fernandez and Holland, 1994). The effect of this event is that all vertebrates are paleopolyploid (derived from an ancient shared polyploid ancestor). This hypothesis is based on several pieces of evidence. (1) All vertebrates possess four Hox clusters, while invertebrates have only one. Hox clusters are highly important blocks of in tandem duplicated genes, whose spatial order along the cluster reflects the temporal and spatial order in which they are expressed along the vertebrate body plan during embryogenesis (Garcia-Fernandez, 2005). (2) Human paralogons cover 80% of the human genome and are uniformly
3
distributed (McLysaght et al., 2002). (3) The analysis of the entire Tetraodon nigroviridis genome, which possesses seven different Hox clusters, demonstrates a third WGD event specific to all bony fishes, using the human genome as a non-duplicated reference model (Jaillon et al., 2004). (4) The human genetic map of ancient genes (before the split of tetrapods and fishes) reveals a tetra-paralogy pattern that can only be explained by a double duplication event (Dehal and Boore, 2005). Previous studies (Panopoulou and Pouska, 2005) have tried to discriminate human gene families generated by vertebrate WGDs through the detection of human paralogs, present in single copies in the invertebrate Ciona and instead located in more than one human HOX chromosome (chromosome harboring a HOX cluster). The limitations of this approach are that (1) it wrongly includes human paralogs located by chance in HOX chromosomes and (2) it excludes all gene families with several paralogs both in humans and Ciona,
Fig. 2. Orthologs and paralogs physically associated to HOX clusters between and within vertebrate genomes: a) Location of orthologous genes flanking the HOXD clusters in four vertebrates. Each chromosome is indicated by a grey vertical line (Homo sapiens chromosome 2, Mus musculus chromosome 2, Tetraodon nigroviridis chromosome 2, Danio rerio chromosome 9). The red arrow points out the location of the HOXD cluster along the human chromosome. Each line, linking two chromosomes, represents the orthology relationship between two genes in the corresponding chromosome locations. A line tends to the base of the chromosome, when the ortholog is not detected in the corresponding species. Red lines represent orthologs conserved in all the analysed species, light blue lines in three species and black lines only in one mammal and one fish. Note that red lines are tightly adjacent to the HOX cluster and human share some orthologs with T. nigroviridis and not with mouse, due to the highly rearranged mouse genome. b) Location of paralogous genes flanking the four human HOX clusters. The whole human chromosomes (2,7,12,17), harbouring HOX clusters are reported in a circle. The location of the four human HOX clusters (A, B, C, D) is indicated by a star along the corresponding chromosome. Black lines represent the paralogy relationship between human genes flanking HOX clusters.
4
A. De Grassi et al. / Gene 421 (2008) 1–6
which could also be explained by WGD in humans and lineage specific duplications in Ciona. All vertebrates underwent a massive loss of paralogs after the double genome duplication. Even so, most of the paralogs in current vertebrate genomes are still due to WGDs: 80% of the paralogs in tetrapods and 50–70% of the paralogs in fishes (Blomme et al., 2006). However there is an opposite view about the existence itself of WGDs in vertebrates. This opinion is based on molecular phylogenetic and protein distance studies, mainly on incongruence between the observed tree topologies of some vertebrate specific gene families and the expected topology in case of WGD (Friedman and Hughes, 2001; Hughes and Friedman, 2003; Hughes and Friedman, 2004). Anyway this method have limitations concerning the age estimation of duplications in several instances, e.g. for to the short interval between the two rounds of duplications (Gibson and Spring, 2000; Panopoulou and Poustka, 2005). 4. Evolution of gene-families for protein complex subunits: OXPHOS gene families Several studies have shown that most paralogs, generated by WGD in eukaryotes, are associated to specific processes, such as transcription regulation or signal transduction, generally performed by multiprotein complexes (Maere et al., 2005; Blanc and Wolfe, 2004; Seoighe and Geharing, 2004; Davis and Petrov, 2005). On the other hand, other studies have demonstrated that, both in human and yeast, members of large gene families are rarely involved in complexes (Yang et al., 2003;
Papp et al., 2003a). In this context the “balance hypothesis”, according to which multi-protein complex subunits should maintain a balance in their relative dosage to permit the correct assembly and functioning of the complex (Papp et al., 2003a; Freeling and Thomas, 2006), may be helpful to understand the rules governing the fate of the protein complex paralogs generated by duplication. The analysis of the Paramecium tetraurelia genome (Aury et al., 2006) provides an elegant proof of a two-step mechanism that can explain both the higher or lower duplication level of paralogs encoding multi-protein complex subunits. Our group have studied the evolution of gene families encoding subunits of the five OXPHOS (oxidative phosphorylation) complexes, which are fundamental and highly conserved in all respiring cells. In order to compare the expansion grade of OXPHOS gene families in Metazoa with respect to the global trend in nuclear gene families, we computed the average number of genes per family in the whole nuclear genome, dividing the total number of nuclear genes by the number of associated gene families (see Table 2 in De Grassi et al., 2005). The average size of both “OXPHOS families” and “accessory OXPHOS families” is smaller than the average size of nuclear gene families in all the considered species. On the whole, this suggests that OXPHOS genes are less likely to form duplicates or to preserve them than nuclear genes in either invertebrates or vertebrates. However we observed an expansion of the gene family size going from invertebrates to vertebrates which is correlated with the increase in genome size from insects to vertebrates. Intriguingly, neither of the two tunicate genomes presents duplicates of OXPHOS genes. Tunicates are Urochordata, non-vertebrate deuterostomes with some of the
Fig. 3. Phylogenetic trees of LBP, Cyt c and MLRQ gene families in Metazoa. a) Tree topology according to genomic co-location of vertebrate paralogs and Hox clusters. Black spots represent the first duplication event leading to paralogs in the vertebrate lineage. Each paralog is represented by the Hox cluster it is physically associated to (A: HoxA cluster, B: HoxB cluster, C: HoxC cluster, D: HoxD cluster). Letters in brackets stand for paralogs lost in all vertebrate species. b) Phylogenetic trees obtained by sequence analysis. Black spots represent the first duplication event leading to paralogs in the vertebrate lineage. Circles group orthologs found in species of several classes of vertebrates (m: mammals, g: bird Gallus gallus, a: amphibians, f: fishes) and invertebrates (i). Hox clusters physically associated to paralogs are indicated (A,B,C,D). NMES orthologs (not circled genes) were included in the MLRQ tree as they belong to this family by sequence similarity. Anyway they do not share synteny with the Hox clusters and are preserved from Urocordates to mammals, thus predating the WGDs events. Indeed, in this case, the first event of duplication leading to the MLRQ paralogs in the vertebrate lineage is fixed after the split from the NMES cluster.
A. De Grassi et al. / Gene 421 (2008) 1–6
smallest genomes (about 180 Mb), and represent a key reference group between invertebrates and vertebrates. As a matter of fact, Urochordata arose about 800 Mya, just before one or two genomedoubling events early at the origin of vertebrates (Vandepoele et al., 2004) and underwent numerous phases of independent lineagespecific gene duplication and loss (Holland and Gibson-Brown, 2003). The complete absence of OXPHOS duplicated genes in tunicates, which represents an exception in the gene family expansion trend from insects to vertebrates, should imply that the Ciona genome has lost duplicated genes, if present, since the separation from the common ancestor. This is in line with the general observation that considerable DNA loss in the tunicate lineage has occurred and a number of genes, e.g. Hox genes, which cluster in most other bilaterians, are uncoupled in Ciona. (Holland and Gibson-Brown, 2003). Although OXPHOS gene families seem smaller than the general trend of nuclear gene families, some exceptions have been remarked. In particular, we have observed three gene families with at least two paralogs in vertebrates. These gene families encode cytochrome c (Cyt c, the electron shuttle protein between complex III and IV), Lipid Binding Protein (LBP, the channel protein of complex V which transfers protons through the inner mitochondrial membrane) and the MLRQ subunit (MLRQ, a supernumerary subunit of the large complex I, with unknown function). The phylogenetic analysis of Cyt c and LBP gene families is reported in a previous paper (De Grassi et al., 2006) and allowed to partially reconstruct their evolutionary history. We have found that the members of these gene families are also located in genomic regions flanking Hox clusters in vertebrates. Fig. 2 reports an example of a two-step approach we used to detect ohnologs independently of the duplication level of paralogs in invertebrates. The first step consisted of detecting those gene flanking HOX clusters, harbored in conserved syntenic segments in four vertebrates (Fig. 2a). These orthologs are supposed to occupy that chromosome location also before the split of mammals and fishes. The second step was the further selection of paralogs which flank at least two different HOX clusters (Fig. 2b), reinforcing the hypothesis that their duplication coincides with HOX cluster duplication and indeed with the vertebrate specific events. WGDs can be assumed but we cannot completely exclude large segmental duplications. Using this approach, we detected a co-localization of the three gene families with the Hox clusters, demonstrating they should have arisen by the same WGD (or large segmental duplication) events at the origin of vertebrates and, only afterwards, underwent speciesspecific events of further gene duplications and loss. Fig. 3a shows the real tree topology these paralogs should produce, according to their common origin and co-localization with Hox clusters. The (AB) (CD) model suggests two sequential duplications, giving a proto-AB Hox cluster and a proto-CD Hox cluster after the first event of duplication. It is supported by two independent approaches, based on both the comparative analysis of the clusters content and sequence analysis (Zhang, and Nei, 1996; Amores et al., 1998). Instead, the classical approach, based on sequence molecular analysis and related tree topology (Fig. 3b), shows discrepancy in detecting long distance evolutionary relations of Cyt c, LBP and MLRQ paralogs in vertebrates. This aspect has also been detected in previous studies, according to which the tree topology of paralogs, generated by WGDs in vertebrates, often does not reflect the expected topology (AB)(CD) (Larhammar et al., 2002), maybe due to the short distance between the two WGD events (90–100 My) and the massive subsequent loss of paralogs (Wang and Gu, 2000). However, the molecular approach can successfully detect real orthology relations over relatively short evolutionary distances, for example organisms of the same class such as mammals, birds, amphibians and fishes (Fig. 3b), for which chromosome maps are not available yet. In this case, we observed, for example, that there were two Cyt c paralogs in the ancestral vertebrate, only conserved
5
in the Gallus gallus and rodent genomes (see Fig. 3), whereas humans and fishes (T. nigroviridis and D. rerio) possess a single non orthologous gene. In summary, this is an example that reflects the need to apply comparative genome approaches, deriving from both “classical” molecular phylogenetic analysis and “new” genome map analysis, to successfully define the complex evolutionary relations between gene family members. This is an essential step that should be carried out alongside other comparative phylogenetic or functional analyses. Acknowledgments This work has been supported by grants from MIUR: Cluster C03 Prog. 2 L.488/92; PON - Avviso n. 68 del 23.01.02 Progetto B.I.G; Contributo Straordinario D.M. n. 1105 del 09/10/2002 (Progetto n. 187); PNR 2001–2003 (FIRB art.8) D.M.199, Strategic Program: Post-genome, grant 31-063933; FIRB 2003 art. 8 D.D. 2187 del 12-12-2003 LIBI. References Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. Amores, A., et al., 1998. Zebrafish Hox clusters and vertebrate genome evolution. Science 282, 1711–1714. Arabidopsis Genome Initiative, 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815. Aury, J.M., et al., 2006. Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444, 171–178. Blanc, G., Wolfe, K.H., 2004. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16, 1667–1678. Blomme, T., Vandepoele, K., De Bod, S., Simillio, C., Maere, S., Van de Peer, Y., 2006. The gain and genes during 600 million years of vertebrate evolution. Genome Biol. 7, R43. Bowmaker, J.K., 1998. Evolution of colour vision in vertebrates. Eye 12 (Pt 3b), 541–547. Davis, J.C., Petrov, D.A., 2005. Do disparate mechanisms of duplication add similar genes to the genome? Trends Genet. 21, 548–551. De Grassi, A., Caggese, C., D'Elia, D., Lanave, C., Pesole, G., Saccone, C., 2005. Evolution of nuclearly encoded mitochondrial genes in Metazoa. Gene 354, 181–188. De Grassi, A., Lanave, C., Saccone, C., 2006. Evolution of ATP synthase subunit c and cytochrome c gene families in selected Metazoan classes. Gene 371, 224–233. Dehal, P., Boore, J.L., 2005. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 3, e314. Ejima, V., Yang, L., 2003. Trans mobilization of genomic DNA as a mechanism for retrotransposon-mediated exon shuffling. Hum. Mol. Genet. 12, 1321–1328. Fitch, W.M., 1970. Distinguishing homologous from analogous protein. Syst. Zool. 19 (2), 99–113. Flajnik, M.F., Kasahara, M., 2001. Comparative genomics of the MHC:glimpses into the evolution of the adaptive immune system. Immunity 15, 351–362. Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.L., Postlethwait, J., 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 31–1531-1545. Freeling, M., Thomas, B.C., 2006. Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res. 16, 805–814. Friedman, R., Hughes, A.L., 2001. Pattern and timing of gene duplication in animal genomes. Genome Res. 11, 1842–1847. Fulton, D.L., Li, Y.Y., Laird, M.R., Horsman, B.G., Roche, F.M., Brinkman, F.S., 2006. Improving the specificity of high-throughput ortholog prediction. BMC Bioinformatics 7, 270. Garcia-Fernandez, J., Holland, P.W., 1994. Archetypal organization of the amphioxus Hox gene cluster. Nature 370, 563–566. Garcia-Fernandez, J., 2005. The genesis and evolution of homeobox gene clusters. Nat. Rev., Genet. 6, 881–892. Gibson, T.J., Spring, J., 2000. Evidence in favour of ancient octaploidy in the vertebrate genome. Biochem. Soc. Trans. 28 (2), 259–264. Gu, Z., Nicolae, D., Lu, H.H., Li, W.H., 2002. Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet. 18, 609–613. Gu, Z., Rifkin, S.A., White, K.P., Li, W.H., 2004. Duplicate genes increase gene expression diversity within and between species. Nat. Genet. 36, 577–579. Holland, L.Z., Gibson-Brown, J.J., 2003. The Ciona intestinalis genome: when the constraints are off. Bioessays 25 (6), 529–532. Hughes, A.L., Friedman, R., 2003. 2R or not 2R: testing hypotheses of genome duplication in early vertebrates. J. Struct. Funct. Genomics 3 (1-4), 85–93. Hughes, A.L., Friedman, R., 2004. Pattern of divergence of amino acid sequences encoded by paralogous genes in human and pufferfish. Mol. Phylogenet. Evol. 32 (1), 337–343. Huminiecki, L., Wolfe, K.H., 2004. Divergence of spatial gene expression profiles following species-specific gene duplications in human and mouse. Genome Res. 14, 1870–1879. Jaillon, O., et al., 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431, 946–957.
6
A. De Grassi et al. / Gene 421 (2008) 1–6
Kellis, M., Birren, B.W., Lander, E.S., 2004. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617–624. Larhammar, D., Lundin, L.G., Hallbook, F., 2002. The human Hox-bearing chromosome regions did arise by block or chromosome (or even genome) duplications. Genome Res. 12, 1910–1920. Li, H., et al., 2006. TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 34, D572–D580. Li, W.H., 1999. Molecular Evolution. Sinauer, Sunderland, MA. Lynch, M., Conery, J.S., 2000. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155. Lynch, M., Conery, J.S., 2003. The origins of genome complexity. Science 302, 1401–1404. Maere, S., et al., 2005. Modeling gene and genome duplications in eukaryotes. Proc. Natl. Acad. Sci. U. S. A. 102, 5454–5459. Makova, K.D., Li, W.H., 2003. Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res. 13, 1638–1645. Marx, K.A., Allen, J.R., Hearst, J.E., 1976. Characterization of the repetitious human DNA families. Biochim. Biophys. Acta 425 (2), 129–147. McLysaght, A., Hokamp, K., Wolfe, K.H., 2002. Extensive genomic duplication during early chordate evolution. Nat. Genet. 31, 200–2004. Nei, M., 2007. The new mutation theory of phenotypic evolution. Proc. Natl. Acad. Sci. U. S. A. 104, 12235–12242. Niimura, Y., Nei, M., 2005. Evolutionary changes of the number of olfactory receptor genes in the human and mouse lineages. Gene 346, 23–28. Nozawa, M., Aotsuka, T., Tamura, K., 2005. A novel chimeric gene, siren, with retroposed promoter sequence in the Drosophila bipectinata complex. Genetics 171, 1719–1727. Nozawa, M., Nei, M., 2007. Evolutionary dynamics of olfactory receptor genes in Drosophila species. Proc. Natl. Acad. Sci. U. S. A. 104, 7122–7127. O'Brien, K.P., Remm, M., Sonnhammer, E.L., 2005. In paranoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33, D476–D480. Ohno, S., 1970. Evolution by Gene Duplication. George Allen and Unwin, London. Panopoulou, G., Poustka, A.J., 2005. Timing and mechanism of ancient vertebrate genome duplications. The adventure of a hypothesis. Trends Genet. 21, 559–567.
Papp, B., Pal, C., Hurst, L.D., 2003a. Dosage sensitivity and the evolution of gene families in yeast. Nature 424, 194–197. Papp, B., Pal, C., Hurst, L.D., 2003b. Evolution of cis-regulatory elements in duplicated genes of yeast. Trends Genet. 19, 417–422. Seoighe, C., Gehring, C., 2004. Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome. Trends Genet. 20, 461–464. Shiu, S.H., Byrnes, J.K., Pan, R., Zhang, P., Li, W.H., 2006. Role of positive selection in the retention of duplicate genes in mammalian genomes. Proc. Natl. Acad. Sci. U. S. A. 103, 2232–2236. Tatusov, R.L., et al., 2001. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28. Van de Peer, Y., 2004. Computational approaches to unveiling ancient genome duplications. Nat. Rev., Genet. 5, 752–763. Vandepoele, K., De Vos, W., Taylor, J.S., Meyer, A., Van de Peer, Y., 2004. Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. Proc. Natl. Acad. Sci. U. S. A. 101, 1638–1643. Vandepoele, K., Simillion, C., Van de Peer, Y., 2002. Detecting the undetectable: uncovering duplicated segments in Arabidopsis by comparison with rice. Trends Genet. 606–608. Wang, Y., Gu, X., 2000. Evolutionary patterns of gene families generated in the early stage of vertebrates. J. Mol. Evol. 51, 88–96. Wolfe, K., 2000. Robustness it's not where you think it is. Nat. Genet. 25, 3–4. Wolfe, K.H., Shields, D.C., 1997. Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713. Yang, J., Lusk, R., Li, W.H., 2003. Organismal complexity, protein complexity, and gene duplicability. Proc. Natl. Acad. Sci. U. S. A. 100, 15661–15665. Zhang, J., Nei, M., 1996. Evolution of Antennapedia-class homeobox genes. Genetics 142, 295–303. Zhang, Z., Gu, J., Gu, X., 2004. How much expression divergence after yeast gene duplication could be explained by regulatory motif evolution? Trends Genet. 20, 403–407. Zheng, X.H., Lu, F., Wang, Z.Y., Zhong, F., Hoover, J., Mural, R., 2005. Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs. Bioinformatics 6, 703–710.