Journal of Invertebrate Pathology 101 (2009) 169–171
Contents lists available at ScienceDirect
Journal of Invertebrate Pathology journal homepage: www.elsevier.com/locate/yjipa
Lateral gene transfer, lineage-specific gene expansion and the evolution of Nucleo Cytoplasmic Large DNA viruses Jonathan Filée Laboratoire Evolution, Génomes et Spéciation, CNRS UPR 9034, Avenue de la Terrasse 91198 Gif sur Yvette cedex, France
a r t i c l e
i n f o
Article history: Received 26 February 2009 Accepted 13 March 2009 Available online 18 May 2009 Keywords: Nucleo Cytoplasmic Large DNA viruses (NCLDVs) Genome evolution Lateral gene transfer
a b s t r a c t Nucleo Cytoplasmic Large DNA viruses (NCLDVs) are a diverse group that infects a wide range of eukaryotic hosts (for example, vertebrates, insects, protists,. . .) and also show a huge range in genome size (between 100 kb and 1.2 Mb). Here I review some recent results that shed light on the origin and genome evolution of these viruses. Current data suggests that NCLDVs could have originated from a simple and ancient viral ancestor with a small subset of 30–35 genes encoding replication and structural proteins. Subsequent lateral gene transfer of both cellular genes and diverse families of Mobile Genetic Elements, followed by massive lineage-specific gene duplications is probably responsible for the huge diversity of genome size and composition found in extant NCLDVs. Ó 2009 Elsevier Inc. All rights reserved.
1. Introduction DNA viruses are an ubiquitous component of the biosphere, their number exceeds that of cellular organisms by at least one order of magnitude (Fuhrman, 1999). This abundance is accompanied by an extraordinary diversity in genome structure and composition (Suttle, 2005). Among the viruses, Nucleo Cytoplasmic Large DNA viruses (NCLDVs) are one of the most commonly encountered in a large variety of biotopes (Chen et al., 1996; Larsen et al., 2008; Monier et al., 2008). They are an extremely diverse group whose members infect a wide range of eukaryotic hosts including algae (Phycodnaviruses), protists (Mimivirus) and metazoa (Poxviruses, African Swine Fever Virus, Iridoviruses). NCLDVs are characterised by a large range in genome size (between 100 kb and 1.2 Mb) and, based on a small set of 30 common homologous (core) genes, are thought to be monophyletic (Iyer et al., 2006). The majority of core genes encode enzymes involved in DNA metabolism and replication, or viral structural proteins. NCLDVs encode a B family DNA polymerase, a type II or, rarely, type Ia DNA topoisomerase, a viral type DNA primase, a sliding clamp/PCNA, a DNA ligase, a dUTPase and a thymidine kinase. NCLDVs therefore encode a nearly complete DNA replication apparatus in addition to key enzymes involved in the final steps of DNA metabolism. In addition to DNA replication and metabolism, core NCLDV genes also include structural genes and genes involved in transcription (RNA polymerase subunits for example (Iyer et al., 2006)).
E-mail address: jonathan.fi
[email protected] 0022-2011/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.jip.2009.03.010
These 30 core genes represent a minor component of the genome, thus, the major component of NCLDV genomes is lineagespecific, and its origin is poorly understood. Interestingly, almost all NCLDV lineages include genes with bacterial and eukaryotic homologues (Iyer et al., 2006; Van Etten, 2003). These genes may testify to an ancient origin predating the divergence of the three kingdoms of life, such that each NCLDV lineage has retained a diverse assemblage of genes shared with bacteria and eukaryotes (Raoult et al., 2004). Alternatively, NCLDVs such as the Mimivirus could be regressive or highly derived cellular organisms that have undergone a genome simplification process (Raoult et al., 2004). These findings contrast with the traditional view in which viruses and phages are thought to have evolved mainly by acquisition of genes from disparate sources (Hendrix et al., 1999; Moreira and Lopez-Garcia, 2005). A series of recent papers have discussed the respective roles of the vertical versus lateral gene inheritance in the evolution of the NCLDVs. In this review, I will synthesise these recent developments and will show that lateral gene transfer is a fundamental process driving the evolution of the NCLDVs. 2. Evolution of NCLDV core genes NCLDV genomes are composed by a set of ‘‘core” genes shared by almost all family members. The core is composed of 30–35 genes, encoding viral structural proteins and enzymes involved in DNA metabolism and replication. In phylogenetic analysis of the replication genes the viral genes were generally clustered together more often at the base of the tree and far from related cellular sequences, suggesting that replication genes were present in the NCLDV ancestor and have evolved independently, rarely
170
J. Filée / Journal of Invertebrate Pathology 101 (2009) 169–171
affected by lateral gene transfer (Filee et al., 2008). The situation for protein encoding genes involved in DNA metabolism is very different: numerous lateral gene transfer events followed by homologous or non-homologous replacement have been identified. At least 12 potential gene transfer events from the host and possibly replacement of original copy have been detected (Filee et al., 2008). As well as DNA replication and metabolism genes, core NCLDV genes include structural genes. Several of these, such as the capsid protein gene, have no homologues in cellular sequences but display a highly specific fold also found in the capsid of other eukaryotic and prokaryotic viruses (Benson et al., 2004). This observation suggests that these capsids are homologous, possibly inherited from an ancient common ancestor. It has been suggested that these capsid encoding genes could be considered ‘‘hallmark viral genes” derived from an ancient virus world (Koonin et al., 2006). However, core genes represent only a very small fraction of the total genome content of the NCLDVs (150–900 genes). This implies massive genome expansion and diversification or, alternatively, massive genome reduction within each lineage from a complex ancestor (perhaps a cellular ancestor) (Raoult et al., 2004). In order to better understand the mechanism of NCLDV genome evolution it is therefore necessary to examine in more detail the ‘‘non-core” gene component of the genome. 3. Lateral transfer of cellular genes to the NCLDV genome The importance of lateral gene flow during viral evolution has been a matter of intense debate. Several authors consider viruses as ‘‘bags of genes” that frequently recombine with other viruses and acquire (and lose) genes from their hosts (Hendrix et al., 2000; Moreira and Lopez-Garcia, 2005). This view had been challenged by the discovery of complex viruses such as NCLDVs whose genes are only distantly related to their host counterparts. Analysis of NCLDV ‘‘non-core” genes suggest a massive flow of lateral gene transfer from a large variety of sources, for example, gene transfer with the virus host in the case of metazoans infecting Poxviruses (Filee et al., 2008). Moreover, genes that were more similar to host copies tended to be clustered at the tips of the (linear) genome, which is thought to be a highly recombinogenic region in these viruses (Esposito
et al., 2006). Iridoviruses and the Asfarvirus have fewer genes with eukaryotic affinities than the Poxviruses but a significant fraction of these were also more related to their host homologue than to any other homologues (Table 1). Despite having larger genomes, the Chlorella Phycodnaviruses and the Mimivirus have the lowest proportion of genes (number/genome length) of potentially eukaryotic origins. Only a handful of these eukaryotic-like genes appear to be closely related to their host counterpart (Filee et al., 2008). These data indicate that among the NCLDVs, Poxviruses have the strongest tendency to exchange genes with their host. The genomic structure of NCLDVs that infect Chlorella algae (Phycodnaviruses) and Amoeba (Mimivirus) suggest a more complex scenario. These genomes have a large number of genes with high similarity to bacterial genes (Table 1) (Filee et al., 2007). It has been shown that Chlorella Phycodnaviruses and Mimivirus genomes have 48–57 genes and 96 genes, respectively, that are unambiguously of bacterial origin (Table 1). These genes tend to be clustered in islands towards the extremity of the genomes and co-localise with bacterial-like Insertion Sequences. It was suggested that their eukaryotic hosts, which graze on bacteria, could provide an ‘‘ecological” niche for viral access to bacterial gene pools. The Phycodnaviruses analysed infect Chlorellae, which in turn live in symbiosis with Paramecia or Heliozoa from the genius Acanthocystis while the Mimivirus infects Amoebae directly. More recently, a phylogenomic study of the Mimivirus genome raises new questions about the massive acquisition of cellular genes from bacteria and hosts. These authors suggested that several Mimivirus genes have been acquired from other protists such as the Heterolobosea or Kinetoplastida (Moreira and Brochier-Armanet, 2008). On the other hand, many NCLDV lineages infect metazoa or algae that do not prey on bacteria. Poxviruses, Iridoviruses, Asfarvirus and other Phycodnaviruses of free growing algae have considerably fewer bacteriallike genes than Chlorella Phycodnaviruses and the Mimivirus. In metazoan and free growing algae viruses, these genes tend to be scattered in the genome, with no apparent clustering in islands towards the extremities of the genomes (Filee et al., 2007). It has been suggested that, among the NCLDV, there is a propensity to acquire genes from the host and that a significant fraction of the diversity and the size of the genomic repertoire of large DNA viruses can be explained by lateral gene transfer. Interestingly, some of the laterally acquired genes are mobile genetic elements.
Table 1 Summary of the main evolutionary characteristics of NCLDV genomes. Mimivirus is indicated in black, Phycodnaviruses in green, Iridoviruses in blue, Asfarvirus in orange and Poxviruses in red.
J. Filée / Journal of Invertebrate Pathology 101 (2009) 169–171
4. The importance of mobile elements and lineage-specific gene expansion Sequencing of a diverse set of NCLDV genomes showed that they contain a diverse assemblage of mobile genetic elements (MGEs) (Filee et al., 2007). Phycodnaviruses and the Mimivirus genomes include Insertion Sequences (ISs) typically found in Bacteria and Archaea. These elements belong to the IS200/IS605 and the IS4 families (Filee et al., 2007) and were probably acquired via horizontal gene transfer from prokaryotic sources. These genomes also encode multiple Mobile Endonucleases sometimes associated with selfsplicing introns. Most of these belong to the HNH family. Comparison of closely related Chlorella Phycodnaviruses genomes shows extensive variation in the number and location of HNH endonucleases genes, suggesting that they have or had the capacity to excise, insert, and spread within the host genome or alternatively to transfer into a new one (Filee et al., 2007). Chlorella Phycodnaviruses also encode several mobile endonucleases which show weak similarity with to the GIY-YIG families (Fitzgerald et al., 2007). Finally NCLDVs also encode a limited number of Inteins (Filee et al., 2007), protein segments able to excise and rejoin the flanking peptide. The exact role and origin of these multiple MGEs remain unknown. It is unclear what forces are involved in the maintenance of these elements and whether they provide selective advantages for the virus (or viral host). As noted above, ISs found in NCLDVs co-localise with bacterial-like genes and may promote lateral gene transfer. If this is the case, it seems unlikely that they carry ‘‘foreign” DNA segments in the form of transposons since they appear to be located within bacterial-like segments rather than at bacterial–viral junctions. Mobile endonucleases may also promote genomic recombination. Some of these elements have undergone frequent events of transposition and/or duplication to reach numerous copies in a given genome. This phenomenon, called lineage-specific gene (family) expansion, is a common evolutionary mechanism in cellular organisms (Lespinet et al., 2002) (Jordan et al., 2001). The role of lineage-specific expansion in the evolution of NCLDV was first reported by Iyer et al. (2006). All NCLDV lineages displayed evidence of gene duplication and genes with a wide range of functions were affected (Filee and Chandler, 2008) (Table 1). Moreover, there appeared to be a general correlation between the size of the genome and the number of duplicated genes. Small Iridoviruses and Poxviruses have fewer paralogs than large Phycodnavirus and the Mimivirus. The latter have 398 paralogous genes divided into 86 families. This amounts to more than 43% of the 900 genes representing the genomic complexity of the Mimivirus (Table 1). In addition to single gene duplication, Suhre (2005) reported segmental duplications in the Mimivirus that have affected a large portion of genome (duplication of two fragments of 80 kb and 110 kb at the tips of genome). Taken together, these results suggest that gene duplications and mobile elements transpositions are important forces in the evolution of NCLDV, and that recent lineage-specific expansion of genes is responsible for a large part of the genomic complexity of these large DNA viruses.
5. Conclusions NCLDV genomes are composed of a small number of ancient core genes (30–35 genes) which are predominantly vertically inherited; with little, or no, lateral gene transfer involved (Filee et al., 2008). Thus, the large size and diversity of their genomes seems recent and resulting from two major evolutionary forces: (i) lateral acquisition of genes from cellular organisms and (ii) duplication and line-
171
age-specific expansion of families of genes, including a large variety of mobile elements. Perhaps the most striking example of such massive genome growth is in the Mimivirus. Of a total of more than 900 genes, there are 300 paralogous genes including MGEs (398 duplicated genes divided in 86 families) specific to the lineage (Suhre, 2005) and more than 100 laterally acquired bacterial- and eukaryote-like genes (Filee et al., 2007; Moreira and BrochierArmanet, 2008). Thus, more than 40% of the Mimivirus genome is of recent origin (i.e. acquired after the divergence of the NCLDV lineages). Such massive gene duplication and lateral gene acquisition appears to contradict the hypothesis that genome reduction is the basis of the origin of the Mimivirus and other NCLDVs. Acknowledgments I would like to thank M. Chandler, C. Metcalfe and members of the Transposable Elements team at LEGS (P. Capy, A. Hua-Van, J.D. Rouault, T. Boutin, I. Clavereau, E. Robillard and R. Stamboliyska) for fruitful discussion and E. Herniou for critical reading of the manuscript. References Benson, S.D., Bamford, J.K., Bamford, D.H., Burnett, R.M., 2004. Does common architecture reveal a viral lineage spanning all three domains of life? Mol. Cell. 16, 673–685. Chen, F., Suttle, C.A., Short, S.M., 1996. Genetic diversity in marine algal virus communities as revealed by sequence analysis of DNA polymerase genes. Appl. Environ. Microbiol. 62, 2869–2874. Esposito, J.J., Sammons, S.A., Frace, A.M., Osborne, J.D., Olsen-Rasmussen, M., Zhang, M., Govil, D., Damon, I.K., Kline, R., Laker, M., Li, Y., Smith, G.L., Meyer, H., Leduc, J.W., Wohlhueter, R.M., 2006. Genome sequence diversity and clues to the evolution of variola (smallpox) virus. Science 313, 807–812. Filee, J., Chandler, M., 2008. Convergent mechanisms of genome evolution of large and giant DNA viruses. Res. Microbiol. 159, 325–331. Filee, J., Pouget, N., Chandler, M., 2008. Phylogenetic evidence for extensive lateral acquisition of cellular genes by Nucleocytoplasmic large DNA viruses. BMC Evol. Biol. 8, 320. Filee, J., Siguier, P., Chandler, M., 2007. I am what I eat and I eat what I am: acquisition of bacterial genes by giant viruses. Trends Genet. 23, 10–15. Fitzgerald, L.A., Graves, M.V., Li, X., Feldblyum, T., Nierman, W.C., Van Etten, J.L., 2007. Sequence and annotation of the 369-kb NY-2A and the 345-kb AR158 viruses that infect Chlorella NC64A. Virology 358, 472–484. Fuhrman, J.A., 1999. Marine viruses and their biogeochemical and ecological effects. Nature 399, 541–548. Hendrix, R.W., Lawrence, J.G., Hatfull, G.F., Casjens, S., 2000. The origins and ongoing evolution of viruses. Trends Microbiol. 8, 504–508. Hendrix, R.W., Smith, M.C., Burns, R.N., Ford, M.E., Hatfull, G.F., 1999. Evolutionary relationships among diverse bacteriophages and prophages: all the world’s a phage. Proc. Natl. Acad. Sci. USA 96, 2192–2197. Iyer, L.M., Balaji, S., Koonin, E.V., Aravind, L., 2006. Evolutionary genomics of NucleoCytoplasmic Large DNA viruses. Virus Res. 117, 156–184. Jordan, I.K., Makarova, K.S., Spouge, J.L., Wolf, Y.I., Koonin, E.V., 2001. Lineagespecific gene expansions in bacterial and archaeal genomes. Genome Res. 11, 555–565. Koonin, E.V., Senkevich, T.G., Dolja, V.V., 2006. The ancient Virus World and evolution of cells. Biol. Direct. 1, 29. Larsen, J.B., Larsen, A., Bratbak, G., Sandaa, R.A., 2008. Phylogenetic analysis of members of the Phycodnaviridae virus family, using amplified fragments of the major capsid protein gene. Appl. Environ. Microbiol. 74, 3048–3057. Lespinet, O., Wolf, Y.I., Koonin, E.V., Aravind, L., 2002. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 12, 1048– 1059. Monier, A., Claverie, J.M., Ogata, H., 2008. Taxonomic distribution of large DNA viruses in the sea. Genome Biol. 9, R106. Moreira, D., Brochier-Armanet, C., 2008. Giant viruses, giant chimeras: the multiple evolutionary histories of Mimivirus genes. BMC Evol. Biol. 8, 12. Moreira, D., Lopez-Garcia, P., 2005. Comment on ‘‘The 1.2-megabase genome sequence of Mimivirus”. Science. 308, 1114; author reply 1114. Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B., Suzan, M., Claverie, J.M., 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306, 1344–1350. Suhre, K., 2005. Gene and genome duplication in Acanthamoeba polyphaga Mimivirus. J. Virol. 79, 14095–14101. Suttle, C.A., 2005. Viruses in the sea. Nature 437, 356–361. Van Etten, J.L., 2003. Unusual life style of giant chlorella viruses. Annu. Rev. Genet. 37, 153–195.