Infection, Genetics and Evolution 6 (2006) 315–322 www.elsevier.com/locate/meegid
Pattern of gene duplication in the Cotesia congregata Bracovirus Robert Friedman a, Austin L. Hughes b,* b
a Bioinformatics Facility, University of Connecticut, Storrs, CT 06269, USA Department of Biological Sciences, University of South Carolina, Coker Life Sciences Bldg., 700 Sumter St., Columbia, SC 29208, USA
Received 26 August 2005; received in revised form 17 October 2005; accepted 18 October 2005 Available online 4 January 2006
Abstract Polydnaviruses (PDVs) are a family of double-stranded DNA viruses genetically linked to their wasp hosts. These viruses utilize the transcriptional machinery of the wasp cells to manufacture viral particles which contain circular segments of DNA. The female wasp, hosting the polydnavirus, lays its eggs along with the viral particles inside a caterpillar. Because no replication of the virus occurs while inside the caterpillar, fixed genetic changes occur solely inside the female wasp, as an integrated portion of its genome. Therefore, evolution of the polydnavirus is expected to parallel that of the wasp. Phylogenetic analysis of the polydnavirus genome showed a pattern of gene duplication consistent with the ‘‘birth-and-death’’ process frequently observed in eukaryotic genomes. Phylogenies provided no unequivocal evidence of horizontal gene transfer between the wasp host and the polydnavirus, but in some cases there were suggestions of such gene transfer. # 2005 Elsevier B.V. All rights reserved. Keywords: Polydnavirus; Bracovirus; Horizontal gene transfer; Gene duplication
1. Introduction Polydnaviruses (PDVs) are a family of double-stranded DNA viruses which are genetically linked with the genome of endoparasitic wasps (Edson et al., 1981). These viruses are named for the fact that their genetic material is organized into multiple circular DNA segments, in contrast to a single DNA or RNA molecule as in many known viruses (Stoltz et al., 1984). PDV particles (virions) reside in cells of the endoparasitic wasp and are transferred to its caterpillar host during wasp oviposition (see reviews by Federici and Bigot, 2003; Drezen et al., 2003; Beckage and Gelman, 2004; Kroemer and Webb, 2004). There exist two subfamilies of PDVs classified by an association with a wasp lineage: (1) bracoviruses with braconid wasps and (2) ichnoviruses with ichneumonid wasps. There is neither morphological nor genetic evidence that these two subfamilies are evolutionarily related (Espagne et al., 2004), and therefore their similar life histories may have arisen by convergent evolution. The life cycle of PDVs is unusual in that the virus’ genes are solely inherited as an integrated portion of the endoparasitic
* Corresponding author. Tel.: +1 803 777 9186; fax: +1 803 777 4002. E-mail address:
[email protected] (A.L. Hughes). 1567-1348/$ – see front matter # 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2005.10.001
wasp genome. This also suggests that the pattern of evolution in the virus will resemble that of the wasp. The transmission of PDV occurs during wasp egg-laying inside the caterpillar host; however, no further replication of PDV will occur in this host. It is not known for certain how the virions hide the wasp eggs from host immunity, though it is known that the virions are attached to the surface of the eggs during oviposition. Espagne et al. (2004) have likened this process to the wasp deploying a ‘‘biological weapon’’ inside the caterpillar by releasing viral particles that invade the cells of the host, thereby serving to dampen the host’s immune response to the wasp eggs. The analogy is quite appropriate because these viral particles will not contribute to the next generation of PDVs, instead the next generation is encoded as an integral part of the genome contained in each wasp egg. This is in contrast to most other viruses that produce progeny as a result of transmission and replication of viral particles inside a host cell. Federici and Bigot (2003) have suggested that the symbiotic relationship between PDV and wasp seems less like that of two organisms living together and more like that of the relationship between an organelle and its cell. This is because the PDV genes are linked with the wasp genome and no replication occurs inside the cells of the host caterpillar. However, virions do invade the host cells and use the cellular machinery to construct proteins to benefit both the PDV and the wasp
316
R. Friedman, A.L. Hughes / Infection, Genetics and Evolution 6 (2006) 315–322
(Kroemer and Webb, 2004). With the mounting evidence that many eukaryotic organelles came about from the merging of host organism and a second engulfed single-celled organism (Williams et al., 2002), one has to ask whether viruses have evolved symbioses with cells in a similar fashion (Federici and Bigot, 2003). Alternatively, it is possible that the PDV has no viral ancestor (Iyer et al., 2001; Espagne et al., 2004), but instead presents an example of an emergence of a new virus from a segment (Belle et al., 2002) or segments of the wasp genome. Here we address these questions by evolutionary analyses of the recently sequenced PDV (Espagne et al., 2004), Cotesia congregata Bracovirus (CcBV), and the related genes from other organisms. Such analysis is especially useful on a completed genome sequence because the presence or absence of homologs can be determined (Hughes and Friedman, 2003). Since gene duplication has occurred in CcBV and homologs located from other organisms, phylogenetic trees can be constructed to establish the pattern of duplication. We also tried to determine the time and origin of the gene duplications, whether horizontally inherited by gene transfer or vertically inherited by gene duplication. 2. Methods 2.1. Sequence data All protein-coding sequences (156 CDSs) for the C. congregata Bracovirus (Espagne et al., 2004) were downloaded from the Genbank and EMBL databases. The CcBV genes were
renamed according to their database accession number followed by an ‘‘underscore’’ character and a number for the relative order the gene occurs in each DNA circle. In addition, to find similar sequences to CcBV genes in other species, the non-redundant (NR) set of protein-coding sequences was obtained from NCBI (ftp://ftp.ncbi.nlm.nih.gov/blast/db/ nr.tar.gz). For these sequences, the gene names were renamed to its species name and the database gi (unique identification) number. 2.2. Homology search Homologous sequences were identified by use of the BLAST software (Version 2.2.9; Altschul et al., 1997). To group the CcBV genes in families of related amino acid sequences, initially Blastclust was used with default parameters except the pairwise sequence alignments had to be at least 30% similar across a minimum of 50% of the sequence lengths (30/ 50). We have previously found that these criteria assemble biologically meaningful multi-gene families (Hughes et al., 2005). In addition, a more relaxed criterion was used by grouping the genes into families using a simple BlastP search with an E-value cut-off of 10 20. Both procedures resulted in fairly similar groupings of genes. To search for homologs within the NCBI non-redundant database, Blastclust was again used with default parameters except that the E-value threshold was set at 10 20 and a criterion of 30/50 as above. The lower E-value was chosen to ignore sequences without a phylogenetic signal, so the homolgous genes that fit the criterions are considered putative
Fig. 1. Diagram and phylogeny of a tandem gene duplication (contiguous; A) and a segmental duplication event (not contiguous; B). Each diagram show the the position of two genes along a chromosome before and after a duplication along with each phylogeny showing the branching pattern of the genes in the diagram. The dashed line in diagram (B) represents the absence of contiguity (genetic linkage) between the original and duplicated genes.
R. Friedman, A.L. Hughes / Infection, Genetics and Evolution 6 (2006) 315–322
orthologs. For the largest gene family containing protein tyrosine phosphatases (PTPs), only representative taxa were retained because of a large number of homologs from eukaryotic species. This procedure resulted in slightly different gene family memberships than those reported by Espagne et al. (2004). Espagne et al. (2004) generally grouped genes into families on the basis of the presence of functional protein domains, even if there was a low degree of sequence similarity. However, the purpose of this study was to establish the evolutionary
317
relationships among genes, and so the criterions specified similarity across a large portion of aligned regions. Nevertheless, statistical support was required to identify gene duplication events, and so the results are robust to small changes in the criterions to cluster genes into families. 2.3. Sequence and phylogenetic analysis Each gene family contained a set of amino acid sequences which were aligned by the ClustalX software (Version 1.83;
Fig. 2. Unrooted phylogenetic tree of the protein tyrosine phosphatase gene family in the Cotesia congregata Bracovirus (CcBV) along with homologs in other organisms. The scale is in amino acid substitutions per site. The bootstrap support values are shown next to the internal branches. The putative gene duplicates are marked by a solid circle at the node and by which hypothesis (see Fig. 1) explains the duplication event.
318
R. Friedman, A.L. Hughes / Infection, Genetics and Evolution 6 (2006) 315–322
Fig. 3. Unrooted phylogenetic tree of the ankyrin gene family and in CcBV. See Fig. 2 for label descriptions.
Thompson et al., 1997). If the sequence alignment contained more than three sequences, a phylogenetic tree was constructed (MEGA2 software; Kumar et al., 2001) by the neighbor-joining method (Saitou and Nei, 1987) using the Poisson model of sequence evolution. The Poisson model corrects for the unobserved changes between sequences but assumes an equal evolutionary rate among sites and equal frequencies of amino acids. To verify that this sequence model of evolution was reliable, the phylogenies were again constructed as above except for relaxation of the rate constancy assumption among sites. The reliability of clustering patterns in the phylogenetic trees was assessed by bootstrapping (Felsenstein, 1985); 1000 boostrap pseudo-samples were used. The percent divergence between duplicate genes was estimated by counting the number of synonymous substitutions per synonymous site (dS; Nei and Gojobori, 1986). Pairs with
dS > 80% were excluded from these analyses. Software was written in the Perl language to implement this divergence measure, process sequence formats and perform other routine tasks. 2.4. Identifying gene duplication events To infer the gene duplication events in CcBV, the minimum number of evolutionary events (tandem gene duplication or segmental duplication) was assumed. Recombination is not known to occur in CcBV and therefore was ignored as a potential force in assembling the DNA circles from the wasp genome (Drezen et al., 2003). For the putative gene duplicates as defined in this study, there are two hypotheses of gene duplication: tandem and segmental (defined in Leister, 2004). Tandem gene duplication (Fig. 1A) is a duplication event which
Fig. 4. Unrooted phylogenetic trees of (A) f1 and (B) capsid gene familis in CcBV. See Fig. 2 for label descriptions.
R. Friedman, A.L. Hughes / Infection, Genetics and Evolution 6 (2006) 315–322
319
results in the original and copied gene being contiguous along the chromosome. Segmental duplication (Fig. 1B) is the duplication of a block of genes in such a way that the original and copied regions are not linked. 3. Results 3.1. Patterns of gene duplication The largest gene family in CcBV was the protein tyrosine phosphatases (Fig. 2). Five gene duplicates were identified by significant bootstrap values supporting their respective clades. Each pair of duplicate genes were categorized as follows: (1) tandem gene duplication (Fig. 1A); (2) segmental duplication (Fig. 1B) or (3) unresolved between the two hypotheses. One of the pairs favored the hypothesis of tandem gene duplication since both genes also cluster on the same DNA circle. The other pairs were considered unresolved because the genes are located on different circles. Explaining the location of duplicate genes on different circles presumably requires a complicated scenario involving either a translocation event or multiple gene losses. A second clear example of tandem duplication was seen in the phylogeny of ankyrin genes (Fig. 3). Two examples of segmental duplication were present in the f1 (Fig. 4A) and capsid (Fig. 4B) gene families. In the f1 gene family, there were two pairs of closely related genes, each of which included a gene on circle 9 and a gene on circle 33 (Fig. 4A). The phylogenetic pattern of these four genes can be explained by a single event in which two ancestrally unlinked genes duplicate simultaneously (segmental duplication; hypothesis #2). Similarly, in the capsid gene family, there were two pairs of closely related genes, each of which included a gene on circle 22 and a gene on circle 36 (Fig. 4B). Here again, the topology supported the segmental duplication hypothesis. As described in the methods, putative gene duplicates were identified by phylogenetic clustering and also by sequence identity for those gene families with too few members to construct a phylogeny. With frequent tandem duplication, it is expected that recently duplicated genes will be linked on the same circle, while older gene duplicates, over time, will have a greater chance of becoming unlinked by recombination. However, this pattern was not apparent for duplicate gene pairs with dS less than 80%. Rather, the frequency distribution of dS values of gene duplicates on the same circle was similar that on different circles (Fig. 5). 3.2. Origins of CcBV genes Evidence for the origin of the viral genes was compiled by surveying for homologs in other organisms (Table 1). Not unexpectedly, many of the matches were to other bracoviruses and its sister group, the ichnoviruses. According to Whitfield (2002), the bracoviruses and ichnoviruses each separately form a monophyletic group. Only two phylogenies (Figs. 2 and 4A) showed a homologous gene clustering with a CcBV gene. In
Fig. 5. Bar chart of number of gene duplicates on the same and different circles for different ranges of sequence divergence. Sequence divergence is measured in percent of synonymous substitutions per synonymous site (dS). Each gene pair was placed in a bin according to sequence divergence and relative location on the DNA circles.
these two cases, the homologous genes were present in a bracovirus or ichnovirus. There were also several examples of putative non-PDV homologs to CcBV genes revealed by amino acid sequence homology search. For instance, gene family 21 had a single match to a protein in the amoeba Naegleria gruberi. Another gene family, #23, had two matches to nucleopolyhedroviruses (baculoviruses). Unusual for its large number of homologs, family 29 is a histone 4-like gene with homologs present in many eukaryotic organisms; only representative samples from different taxon are shown in Table 1. This case was unique since the homology search found no evidence of other viruses possessing histone 4-like genes. This suggested that an ancestral bracovirus had captured the gene from a cellular organism. The protein tyrosine phosphatases (family 1; Table 1) showed homologs in other bracoviruses, Anopheles (mosquitos), Apis (bees) and Geodia (sponges). However, because it was unrooted, the phylogenetic (Fig. 2) tree did not provide unambiguous support for horizontal transfer of a PTP gene to CcBV from a cellular organism. However, the fact that the CcBV PTPs showed evidence of homology to those of insects (Fig. 2) was consistent with horizontal transfer of a PTP gene to an ancestral bracovirus from its insect host. Though phylogenetic support was absent, CcBV gene families 18, 24 and 27 showed evidence of genes from Caenhorabditis briggsae genes, with family 18 showing an additional match to a Parvo-like virus gene and family 27 to a gene in the bony fish Xiphophorus maculatus (Table 1). All these homologs were listed in the NCBI database as proteins with unknown biological function. Since the proteins were not annotated in the database with a putative function, a thorough search for less similar proteins was performed to describe the molecule. For example, family 24 showed a domain with similarity to those annotated as DNA packaging. The others, families 18 and 27, matched with
320
R. Friedman, A.L. Hughes / Infection, Genetics and Evolution 6 (2006) 315–322
Table 1 List of genes in Cotesia congregata Bracovirus that were identified as products of a gene duplication(s) or have a similar protein sequence in another species Family
Genes (paralogs in parentheses)
Location
Homologs in other species (gi# in parentheses)
Molecule description
Glyptapanteles indiensis BV (30725100)a; Apis mellifera (48119192, 48096511); Anopheles gambiae (31240897, 31199365); Geodia cydonium (13276131); Cotesia glomerata BV (39726084) Hyposoter didymator IV (41323343)a; Hyposoter fugitivus IV (46370994)a None None
Protein tyrosine phosphatase (PTP)b
1
See phylogeny
2
See phylogeny
3 4
See phylogeny See phylogeny
5
(AJ632305_1, AJ632329_5); (AJ632321_2 (AJ632321_5, AJ632326_3)); AJ632328_3 (AJ632325_1, AJ632326_2); (AJ632312_1, AJ632331_2); AJ632306_3; AJ632306_4; AJ632309_1; AJ632315_2; AJ632320_6; AJ632322_2 (AJ632304_3, AJ632304_5) (AJ632307_1, AJ632307_2); AJ632313_5 (AJ632320_5, AJ632330_3) (AJ632321_1, AJ632321_4, AJ632321_6) (AJ632324_5, AJ632333_3) (AJ632305_4, AJ632329_1) (AJ632320_1, AJ632328_1) (AJ632305_5, AJ632329_6) (AJ632312_2, AJ632331_1) (AJ632308_2, AJ632308_3) (AJ632314_3, AJ632332_1) AJ632329_2
c2, c19, c25, c30, c31
Cotesia kariyai BV (27260917)
Unknown (f1) b Putative capsid-like protein; lectin c-type domain Soluble protein (f2)b
c3, c6, c9, c12, c18, c20, c23, c25, c33
None
Unknown (hp2)b
c1, c7
None
EP1-likeb
c4, c10
None
Unknown
c18
Cotesia glomerata BV (40019102)
Cysteine-rich (crp)b
c19
None
Cysteine protease inhibitor activity (cyst) b
c22, c36
None
Unknown
c2, c31
Cotesia rubecula BV (49175818)
Unknown
c18, c30
None
Unknown (hp)b
c2, c31
None
Unknown
c9, c33
None
Unknown
c5
None
Unknown
c11, c35
None
Unknown
c31
DNApol B2 domain
AJ632324_3 (AJ632325_4, AJ632326_4) (AJ632320_3, AJ632328_2) AJ632316_5
c22 c23, c25
Parvo-like virus (6063476); Caenhorabditis briggsae (39578791, 39579391, 39579490, 39579561, 39579653, 39579962, 39580270, 39580665, 39583054, 39583193, 39586526, 39589007, 39595333, 39598377) Glyptapanteles indiensis BV (32810640) None
Unknown Ribonuclease T2
c18, c30
Naegleria gruberi (22086975, 22086978)
Unknown
c13
Cotesia rubecula BV (1293788, 13899004)
Glycosylated transmembrane protein
6
7 8
9 10
11 12 13 14 15 16 17 18
19 20 21 22
Ankyrin repeat domain (ank)b
R. Friedman, A.L. Hughes / Infection, Genetics and Evolution 6 (2006) 315–322
321
Table 1 (Continued ) Family
Genes (paralogs in parentheses)
Location
Homologs in other species (gi# in parentheses)
Molecule description
23
AJ632310_1
c7
Unknown
24
AJ632329_4
c31
25
c13, c19
26 27
(AJ632316_3, AJ632321_3) AJ632308_5 AJ632329_3
Leucania separata nucleopolyhedrovirus (2760642); Mamestra configurata nucleopolyhedrovirus A (33331837) Caenhorabditis briggsae (39579232, 39579456, 39580272, 39583056) None
c5 c31
28
AJ632316_2
c13
29
AJ632310_3
c7
Cotesia kariyai BV (33331837) Caenhorabditis briggsae (39579459, 39579557, 39587282, 39583195, 39578934, 39588296, 39580273, 39591237, 39595736); Xiphophorus maculatus (1244746) Cotesia plutellae BV (41691715); Cotesia ruficrus BV (27753507); Cotesia kariyai BV (27753509); Cotesia rubecula BV (29470303) Representative taxa only: Stylonychia mytilus (21779960); Euplotes aediculatus (21779945); Drosophila melanogaster (51092065); Paracentrotus lividus (3287225); Homo sapiens (49457374); Schistosoma mansoni (10953803); Gallus gallus (50729214); Griffithsia japonica (32401025); Oikopleura dioica (26800918); Plasmodium falciparum (2062367)
DNA packaging domain Membrane protein EP1-like Integrase core domain
Lectin c-type domain
Histone H4-like
Abbreviations: c, DNA circle number; IV, ichnovirus; BV, bracovirus. a The homologs are clustered with a Cotesia congregata Bracovirus gene and supported by a bootstrap value >95%. b The name corresponds to the gene family name as given by Espagne et al. (2004).
minimal confidence to a DNA polymerase B2 and integrase core domain, respectively. 4. Discussion By phylogenetic analysis, it was possible to distinguish between segmental and tandem gene duplication in certain C. congregata Bracovirus gene duplicates. The extensive duplication of genes was not surprising given that CcBV is genetically linked to the wasp genome. It is thus reasonable to assume that the processes occurring in the genome of CcBV will be similar to those occurring in the wasp genome. The one surprising attribute of the virus is that there was nearly an identical number of gene duplicates residing within as between DNA circles and the occurrence of duplicate genes on the same circle was independent of time of duplication (Fig. 5). This pattern suggests that the gene duplicates are separated onto different circles by recombination within a short time interval. A typical pattern of gene duplication in multi-gene families of eukaryotes is a ‘‘birth-and-death’’ process, whereby genes arise continually by duplication and are lost by deletion or by mutational events causing loss of expression (Hughes and Nei, 1989; Nei et al., 1997). This pattern, which gives rise to lineagespecific gene duplication and distinct lineage-specific composition of multi-gene families, has also been described in certain viruses. One example involves nanoviruses, whose genomes consist of multiple circular single-stranded DNA molecules, which have duplicated independently in different nanovirus lineages (Hughes, 2004). Likewise, viruses with large double-
stranded DNA genomes, such as poxviruses include multi-gene families showing evidence of a process of duplication and deletion similar to those of eukaryotes, and these duplications are often lineage-specific. After gene duplication occurs, a newly duplicated gene will either be lost or fixed in the population by genetic drift or natural selection (Hughes, 1994; Lynch et al., 2001). Fixation will be selectively favored if a duplicate gene specializes for a new function beneficial to the organism (Hughes, 1994, 1999). Given that the CcBV is linked to the wasp genome, any benefit to the wasp larvae is expected to enhance the chances of retention of a new gene duplicate in the virus. Evidence of lateral transfer can only be established through the construction and interpretation of gene phylogenies. In the case of the polydnaviruses, the clustering of a non-PDV gene with an ichnovirus or bracovirus in a rooted phylogenetic tree would be supportive of a lateral gene transfer event. The mere presence of a non-PDV gene in a phylogeny of PDV genes is not sufficient to conclude on lateral gene transfer. Furthermore, significant support by bootstrapping or some similar statistical test is required. Our results thus show only suggestions of lateral gene transfer, not definitive proof. Even though the PDV histone 4-like gene has many homologs in eukaryotes, there was no firm phylogenetic evidence of gene transfer. However, since the PDV genes are linked to the wasp genome, perhaps the histone 4-like gene was captured from the wasp. Actually this ‘‘capture’’ event does not require gene transfer, but a mere rearrangement event to relocate a gene in the wasp genome to the portion that encodes
322
R. Friedman, A.L. Hughes / Infection, Genetics and Evolution 6 (2006) 315–322
the PDV genes. So, even without firm evidence, this scenario makes sense, especially since the histone 4-like gene in not present in other viruses. The PTP genes show a similar pattern according to Espagne et al. (2004) who reported homologs in both baculoviruses and poxviruses. However, in this study there was no evidence of such homologs because of the strict citeria used to establish homology. The ultimate evolutionary origin of these viruses remains an unanswered question. It might be hypothesized that the virus was derived from an ancestral free-living virus that eventually incorporated itself into the wasp genome. In this case, it is possible that the viral genome represents an amalgam of genes from an original virus and genes derived from the wasp. We look forward to future studies that experimentally confirm whether all CcBV genes are linked on a single wasp chromosome and how the DNA circles have its genes organized relative to the wasp genome. Answers to these questions will help to clarify the evolutionary history of these viruses. Acknowledgment This research was supported by grant GM43940 from the National Institutes of Health to A.L.H. References Altschul, S.F., Madden, T.L., Scha¨ffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. Beckage, N.E., Gelman, D.B., 2004. Wasp parasitoid disruption of host development: implications for new biologically based strategies for insect control. Annu. Rev. Entomol. 49, 299–330. Belle, E., Beckage, N.E., Rousselet, J., Poirie, M., Lemeunier, F., Drezen, J.M., 2002. Visualization of polydnavirus sequences in a parasitoid wasp chromosome. J. Virol. 76, 5793–5796. Drezen, J.M., Provost, B., Espagne, E., Cattolico, L., Dupuy, C., Poirie, M., Periquet, G., Huguet, E., 2003. Polydnavirus genome: integrated vs. free virus. J. Insect Physiol. 49, 407–417. Edson, K.M., Vinson, S.B., Stoltz, D.B., Summers, M.D., 1981. Virus in a parasitoid wasp: suppression of the cellular immune response in the parasitoid’s host. Science 211, 582–583. Espagne, E., Dupuy, C., Huguet, E., Cattolico, L., Provost, B., Martins, N., Poirie, M., Periquet, G., Drezen, J.M., 2004. Genome sequence of a polydnavirus: insights into symbiotic virus evolution. Science 306, 286– 289.
Felsenstein, J., 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791. Federici, B.A., Bigot, Y., 2003. Origin and evolution of polydnaviruses by symbiogenesis of insect DNA viruses in endoparasitic wasps. J. Insect Physiol. 49, 419–432. Hughes, A.L., 1994. The evolution of functionally novel proteins after gene duplication. Proc. R. Soc. Lond. B 256, 119–124. Hughes, A.L., 1999. Adaptive Evolution of Genes and Genomes. Oxford University Press, New York. Hughes, A.L., 2004. Birth-and-death evolution of protein-coding regions and concerted evolution of non-coding regions in the multi-component genomes of nanoviruses. Mol. Phylogenet. Evol. 30, 287–294. Hughes, A.L., Nei, M., 1989. Evolution of the major histocompatibility complex: independent origin of nonclassical class I genes in different groups of mammals. Mol. Biol. Evol. 6, 559–579. Hughes, A.L., Friedman, R., 2003. Poxvirus genome evolution by gene gain and loss. Mol. Phylogenet. Evol. 35, 186–195. Hughes, A.L., Ekollu, V., Friedman, R., Rose, J.R., 2005. Gene family contentbased phylogeny of prokaryotes: the effect of criteria for inferring homology. Syst. Biol. 54, 268–276. Iyer, L.M., Aravind, L., Koonin, E.V., 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. Virol. 75, 11720–11734. Kroemer, J.A., Webb, B.A., 2004. Polydnavirus genes and genomes: emerging gene families and new insights into polydnavirus replication. Annu. Rev. Entomol. 49, 431–456. Kumar, S., Tamura, K., Jakobsen, I.B., Nei, M., 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17, 1244–1245. Leister, D., 2004. Tandem and segmental gene duplication and recombination in the evolution of plant disease resistance gene. Trends Genet. 20, 116–122. Lynch, M., O’Hely, M., Walsh, B., Force, A., 2001. The probability of preservation of a newly arisen gene duplicate. Genetics 159, 1789–1804. Nei, M., Gojobori, T., 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426. Nei, M., Gu, X., Sitnikova, T., 1997. Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. U.S.A. 94, 7799–7806. Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425. Stoltz, D.B., Krell, P., Summers, M.D., Vinson, S.B., 1984. Polydnaviridae—a proposed family of insect viruses with segmented, double-stranded, circular DNA genomes. Intervirology 21, 1–4. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882. Whitfield, J.B., 2002. Estimating the age of the polydnavirus/braconid wasp symbiosis. Proc. Natl. Acad. Sci. U.S.A. 99, 7508–7513. Williams, B.A., Hirt, R.P., Lucocq, J.M., Embley, T.M., 2002. A mitochondrial remnant in the microsporidian Trachipleistophora hominis. Nature 418, 865–869.