Evolutionary genomics of archaeal viruses: Unique viral genomes in the third domain of life

Evolutionary genomics of archaeal viruses: Unique viral genomes in the third domain of life

Virus Research 117 (2006) 52–67 Evolutionary genomics of archaeal viruses: Unique viral genomes in the third domain of life David Prangishvili a , Ro...

785KB Sizes 4 Downloads 44 Views

Virus Research 117 (2006) 52–67

Evolutionary genomics of archaeal viruses: Unique viral genomes in the third domain of life David Prangishvili a , Roger A. Garrett b , Eugene V. Koonin c,∗ a

b

Unit´e de Biologie Mol´eculaire du G`ene chez les Extrˆemophiles, Institut Pasteur, rue Dr. Roux 25, 75724 Paris Cedex 15, France Danish Archaea Centre, Institute of Molecular Biology and Physiology, University of Copenhagen, Solvgade 83H, DK-1307 Copenhagen, Denmark c National Center for Biotechnology Information/NLM/NIH, 8600 Rockville Pike, Bldg. 38A, Bethesda, MD 20894, USA Available online 28 February 2006 This article is dedicated to the memory of Wolfram Zillig, the pioneer of the study of archaeal viruses.

Abstract In terms of virion morphology, the known viruses of archaea fall into two distinct classes: viruses of mesophilic and moderately thermophilic Eueryarchaeota closely resemble head-and-tail bacteriophages whereas viruses of hyperthermophilic Crenarchaeota show a variety of unique morphotypes. In accord with this distinction, the sequenced genomes of euryarchaeal viruses encode many proteins homologous to bacteriophage capsid proteins. In contrast, initial analysis of the crenarchaeal viral genomes revealed no relationships with bacteriophages and, generally, very few proteins with detectable homologs. Here we describe a re-analysis of the proteins encoded by archaeal viruses, with an emphasis on comparative genomics of the unique viruses of Crenarchaeota. Detailed examination of conserved domains and motifs uncovered a significant number of previously unnoticed homologous relationships among the proteins of crenarchaeal viruses and between viral proteins and those from cellular life forms and allowed functional predictions for some of these conserved genes. A small pool of genes is shared by overlapping subsets of crenarchaeal viruses, in a general analogy with the metagenome structure of bacteriophages. The proteins encoded by the genes belonging to this pool include predicted transcription regulators, ATPases implicated in viral DNA replication and packaging, enzymes of DNA precursor metabolism, RNA modification enzymes, and glycosylases. In addition, each of the crenarchaeal viruses encodes several proteins with prokaryotic but not viral homologs, some of which, predictably, seem to have been scavenged from the crenarchaeal hosts, but others might have been acquired from bacteria. We conclude that crenarchaeal viruses are, in general, evolutionarily unrelated to other known viruses and, probably, evolved via independent accretion of genes derived from the hosts and, through more complex routes of horizontal gene transfer, from other prokaryotes. Published by Elsevier B.V. Keywords: Crenarchaeal viruses; Virus evolution; Hyperthermophiles; Horizontal gene transfer

1. Introduction Most if not all of the known cellular life forms play host to an enormous variety of viruses. Obviously, viruses as a whole have had multiple origins (Bamford, 2003). However, comparative genomic analysis allowed the demonstration of a monophyletic origin for large groups of viruses, e.g., the eukaryotic nucleocytoplasmic large DNA viruses which include such diverse families as poxviruses, asfarviruses, iridoviruses, phycodnaviruses, and the mimivirus (Iyer et al., 2001; Raoult et al., 2004). Bacteriophages (phages for short) – viruses infecting



Corresponding author. Tel.: +1 301 435 5913; fax: +1 301 435 7794. E-mail address: [email protected] (E.V. Koonin).

0168-1702/$ – see front matter. Published by Elsevier B.V. doi:10.1016/j.virusres.2006.01.007

bacteria – comprise an enormously diverse set of families which are unrelated to each other at the level of complete genomes (Ackermann, 2003; Casjens, 2005; Rohwer, 2003). However, the numerous head-and-tail phages (Caudovirales), which constitute the great majority of phages isolated from any bacterial community (Ackermann, 1998), have been shown to share a common gene pool within which phage genomes are extensively mixed and matched via recombination, which led to the striking parable “all the world’s a phage” (Hendrix et al., 1999; Pedulla et al., 2003). Bacteriophages, with the possible exception of single-stranded RNA phages (such as MS2 and Q␤), do not show direct, vertical evolutionary relationships with eukaryotic viruses although there is a number of common genes between certain families of bacteriophages and eukaryotic viruses. Most of these shared genes encode proteins involved in viral DNA

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

replication and packaging, and nucleotide metabolism, such as DNA polymerases, primases, helicases, other ATPases, including terminases, dUTPases, and thymidine kinases, but also some proteins involved in viral morphogenesis, such as proteases and glycosylases (Cheng et al., 2004; Iyer et al., 2006, 2005; Knopf, 1998; Markine-Goriaynoff et al., 2004). Conceivably, the origin of eukaryotic viruses involved fairly extensive borrowing from the pool of phage genes (see Iyer et al., 2006, for a more detailed discussion of the possible relationships between the gene pools of eukaryotic viruses and bacteriophages). From a general evolutionary standpoint, viruses of the third domain of life, archaea, are of major interest. Most cellular characteristics of archaea resemble those of bacteria and, accordingly, one might expect that archaeal viruses would resemble bacteriophages. Moreover, given the existence of a shared gene pool of tailed phages (Casjens, 2005; Hendrix et al., 1999; Pedulla et al., 2003) and also the well-established and extensive horizontal gene transfer (HGT) between bacteria and archaea (Koonin et al., 2001, 1997; Lawrence and Hendrickson, 2003), it even could be anticipated that archaeal viruses would share many genes with bacteriophages and, effectively, would represent distinct phage varieties. However, the information-processing systems of archaea are distinct from the bacterial counterparts but monophyletic with the eukaryotic ones (Brown and Doolittle, 1997; Edgell and Doolittle, 1997; Leipe et al., 1999), which suggests the intriguing possibility that archaea might harbor viruses related to eukaryotic viruses, at least with respect to the replication machinery. Archaeal viruses have not been extensively studied until very recently although there have been early reports on isolation of phage-like particles from halobacteria (Torsvik and Dundas, 1974; Wais et al., 1975). In the last few years, however, a major effort has been undertaken to isolate viruses from hyperthermophilic archaea, particularly, Crenarchaeota (Prangishvili and Garrett, 2004, 2005; Prangishvili et al., 2001; Rice et al., 2001; Snyder et al., 2003). Isolation of archaeal virus-host systems has been linked to the cultivability of the host strains and has been biased by the use of conditions in screening procedures. For example, most of the viruses of extreme halophiles have been isolated by plaque assays on a limited number of host lawns (Bath and Dyall-Smith, 1998; Nuttall and Dyall-Smith, 1993; Porter et al., 2005; Wais et al., 1975), and many viruses of hyperthermophiles were primarily identified by a presence in host cells of a high copy number of extrachromosomal elements (Arnold et al., 2000a, 2000b; Geslin et al., 2003; Zillig et al., 1994). All archaeal viruses isolated up to now have linear or circular, double-stranded (ds) DNA genomes. In common with their hosts, they show adaptation to extreme environments. Thus, viruses of extreme halophiles are stable only in solutions of high salt concentration (∼3–5 M) and are inactivated in solutions of low ionic strength (Witte et al., 1997); similarly, viruses of hyperthermophiles are stable at extremely high temperatures (Prangishvili et al., 1999; Schleper et al., 1992). The unexpected result of recent efforts on isolation of archaeal viruses was the discovery of numerous, unusual shapes of virus particles (Prangishvili and Garrett, 2004, 2005; Snyder et al., 2003). Among these diverse morphotypes of archaeal

53

viruses, most of which are not found among bacteriophages or eukaryotic viruses, there are only two that are encountered in both archaeal kingdoms, the Euryarchaeota and Crenarchaeota. These are: (i) spindle-shaped, enveloped virions with a short tail at one pointed end, without any counterparts in other domains (family Fuselloviridae) (Geslin et al., 2003); and (ii) spherical, lipid-containing virions with layered shell appearance and no discernible tail, resembling virions with internal membrane of bacterial virus PRD1 and human adenovirus (Porter et al., 2005). Otherwise, euryarchaeal and crenarchaeal viruses dramatically differ in their morphotypes, genome organization, and gene content, so we discuss them separately. Genome sequencing of archaeal viruses, in particular, the novel viruses of Crenarchaeota, revealed very few genes whose products showed significant sequence similarity to any known proteins. The unusual virion structure and the near lack of conserved proteins encoded in the genomes makes these viruses a seemingly mysterious world of its own. Here we update the classification of archaeal viruses, discuss the salient features of their virion structure, genome organization, and expression, and report an exhaustive comparative sequences analysis of proteins encoded in their genomes, with a focus on the unusual viruses of Crenarchaeota. We report several previously unnoticed relationships between proteins of archaeal viruses and homologs from cellular life forms and the corresponding functional predictions. We also delineate the small pool of genes that are apparently exchanged among archaeal viruses and draw biological parallels between these viruses and temperate bacteriophages, the lack of an evolutionary relationship notwithstanding. 2. Viruses of Euryarchaeota 2.1. Families Myoviridae and Siphoviridae Most known viruses of Euryarchaeota resemble tailed dsDNA bacteriophages, with icosahedral heads and helical tails, contractile or non-contractile, and, accordingly, have been assigned to the families Myoviridae and Siphoviridae, respectively (Table 1). The heads and tails of Euryarchaeal viruses widely differ in size—from 40 to 90 nm in diameter, and from 60 to 230 nm in length, respectively. Typically, these viruses produce relatively complex protein patterns in SDS-PAGE, with up to three major bands and a dozen or more minor bands. All these viruses have linear dsDNA genomes, which vary in size from 30 to 230 kbp, with the typical G+C content of 60–70% reflecting the high G+C content of the host genomes. The genomes and, more broadly, the nucleic acids complements of the virions, of several euryarchaeal viruses of these families have interesting peculiarities. Thus, in the genome of phage N, all cytosine residues are replaced with 5methylcytosine (Vogelsang-Wenke and Oesterhelt, 1988). The only known analogue carrying such a modification is the phage XP12 of Xanthomonas oryzae (Kuo and Tu, 1976). Since the host DNA is not nearly as extensively methylated as the viral genome, it could be predicted that the enzyme responsible for cytosine methylation was virus-encoded; it remains unknown whether or not N has such an enzyme but a viral gene coding

54

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

Table 1 Viruses of Euryarchaeota Family, Genus, virion morphology, Species

Host

Temperate/lytic

dsDNA form, size bp

Family Myoviridae Genus “ΦH-like viruses”: isometric head and contractile tail (head:tail, 64–50:120–170 nm) H Halobacterium salinarum Temperate Linear, 59,000a Ch1 Hs1c

Natrialba magadii Hbt. salinarum

Temperate Lytic

Linear, 58,498 nd

Unassigned species in the family: isometric head and contractile tail (head:tail, 40–90:70–150 nm) Ja1c Hbt. salinarum ? Linear, 230,000a Hbt. salinarum Lytic nd S45c Hbt. salinarum Lytic Linear, 65,000 S5100c HF1 Haloferax, Hbt. Lytic Linear, 75,898 salinarum HF2 Hrb. coriense Lytic Linear, 77,670

Sequence acc. nr

Description

Available for genome fragmentsb AF440695 nd

Gropp et al. (1989), Schnabel et al. (1982), Stolt et al. (1994)

nd nd nd AY190604 AF222060

Family Siphoviridae Genus “ψM1-like viruses”: isometric head and non-contractile tail (head:tail, 40–55:60–230 nm) ␺M1, ␺M2d Methanothermobacter Lytic Linear, 30,400, AF065412, marburgensis 26,111 AF065411 ␺M100e,f Methanothermobacter Temperate Linear, 28,798 AF301375 wolfeii PG Methanobrevibacter Lytic Linear, 50,000a nd smithii F1 Methanobacterium sp. Lytic Linear, 85,000a nd nd F3 Methanobacterium Lytic nd, 36,000a thermoautotrophicum Unassigned species in the family: isometric head and non-contractile tail (head:tail, 40–55:60–230 nm) Nc Hbt. salinarum nd Linear, 56,000a nd c Hbt. salinarum Temperate (?) Linear, 37,600a Hh1 nd Hbt. salinarum Temperate (?) Linear, 29,600a Hh3c nd Halobacterium sp. nd nd nd B10c

Klein et al. (2002), Witte et al. (1997) Torsvik and Dundas (1974, 1980) Wais et al. (1975) Daniels and Wais (1984) Daniels and Wais (1990) Nuttall and Dyall-Smith (1993), Tang et al. (2004) Nuttall and Dyall-Smith (1993, 1995), Tang et al. (2002)

Jordan et al. (1989), Meile et al. (1989), Pfister et al. (1998) Luo et al. (2001) Bertani and Baresi (1986) Nolling and Groffen (1993) Nolling and Groffen (1993)

Vogelsang-Wenke and Oesterhelt (1988) Pauling (1982), Rohrmann et al. (1983) Pauling (1982), Rohrmann et al. (1983) Torsvik (1982)

Floating Genus Salterprovirus: spindle-shaped, pleomorphic (44 nm × 74 nm) His1 Har. hispanica Lytic Linear, 14,900a

nd

Bath and Dyall-Smith (1998)

Unclassified spindle-shaped (52–80 × 70–120) Pyrococcus PAV1e VLPe Methanococcus voltae A3

Temperate (?) Temperate (?)

cccg , 18,000a ccc, 23,000a

nd nd

Geslin et al. (2003) Wood et al. (1989)

Unclassified isometric, 55 nm SH1 Har. hispanica

Lytic

Linear, 30,900a

nd

Bamford et al. (2005b), Porter et al. (2005)

nd: not determined. a Approximate values. b X80163, X80162, X80161, X80164, X00805, X52504, AH004327, S63994, 405325, S63993, 405323, S63992. c Viruses do not exist presently in laboratory collections. d ␺M2 is a deletion mutant of ␺M1, lacking 700 bp fragment. e Infectivity of the virus-like particles has not been demonstrated. f The virus has only been characterized as a defective provirus integrated into the host chromosome. g Covalently closed, circular.

for a cytosine methyltransferase has been identified in the phage H infecting the same host (Stolt et al., 1994). The nucleic acid content of some other euryarchaeal viruses is even more unusual. Thus, virions of the myovirus Ch1 of the haloalkaliphilic genera Natrobacterium and Natrococcos, along with the genomic dsDNA, contain uncharacterized, host-encoded RNA species of 80–700 nucleotides in length (Witte et al., 1997). In the virions of the siphovirus M1, only 85% of the DNA is the phage genome, whereas the rest are head-to-tail multimers of the cryptic plasmid of ∼4.5 kbp from Methanothermobacter

that has no sequence similarity to the M1 genome (Meile et al., 1989). Euryarachaeal viruses of these families show wide differences in host ranges but all infect mesophilic or moderately thermophilic extreme halophiles or methanogens. All viruses grow lytically, with burst sizes of between 140 for Ja1 (Wais et al., 1975) and 1300 for S45 (Daniels and Wais, 1984). The latent period varies from 7 h for H to 17 h for Hs1. For the myoviruses H and Ch1, true lysogeny, i.e., host cells carrying a latent prophage genome, has been demonstrated. The lysogenic mode

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

of H is similar to that of coliphage P1: the viral genome is not integrated into the host chromosome but persists in the host cells as a circular plasmid (Schnabel et al., 1984). By contrast, the genome of the very similar temperate myovirus Ch1 apparently integrates into the host chromosome (Witte et al., 1997). It has been shown that the virus-host interaction of Hs1 depends on the salt concentration in the medium (Torsvik and Dundas, 1974, 1980). Under the conditions of optimal growth at high salt concentration, the virus propagates within its host in a stable carrier state. However, under conditions of low salinity, which are suboptimal for the host, the virus becomes lytic. The virus strategy of getting out when things turn rough apparently is common to both prokaryotic domains. For some viruses, e.g., H, Ch1 and M1, the packaged genome has been shown to be circularly permuted and redundant at both ends suggesting that the genomes are packaged by the headful mechanism (Black, 1989) and that packaging is relatively unspecific. By contrast, the genome of the virus HF2 does not have terminal redundancy but contains 306 bp direct terminal repeats and, by implication, should replicate and package via other mechanisms. Nevertheless, concatameric viral genomes were found in cells replicating HF2 (Nuttall and Dyall-Smith, 1995). In the sequenced genomes of archaeal viruses, many genes, often with presumably related functions, appeared to be cotranscribed and organized into operons (Gropp et al., 1992; Klein et al., 2002; Pfister et al., 1998; Stolt et al., 1994). Transcription has been studied in some detail only for H and HF2 and has been found to be strictly time-dependent—early, intermediate, and late transcripts are clearly distinguishable (Gropp et al., 1992; Tang et al., 2002). In the case of H, the three classes of genes appeared to be more or less continuously expressed after being turned on, i.e., most of the early and intermediate transcripts were synthesized also at the late stages of infection. Early transcription has been found to be essential for the expression of the intermediate and late genes. Transcription of the H genome is regulated by a viral transcription repressor, which prevents the formation of the major early lytic transcript T4 (Ken and Hackett, 1991) (Stolt and Zillig, 1994b). The promoters of the repressor (rep) and T4 genes are adjacent but inversely oriented similarly to the configuration of the cl and cro promoters in bacteriophage ␭. Transcription from these H promoters is mutually exclusive, with only the rep gene transcribed in the lysogenic state and only the T4 transcript produced during the lytic cycle (Stolt and Zillig, 1994a, 1994b). In addition, it has been shown that H employs regulation of gene expression based on antisense RNA, which, in lysogens, mediates the removal of the ribosome-binding site from a transcript involved in the lytic cycle (Stolt and Zillig, 1993). Similar to the bacteriophages of the families Myoviridae and Siphoviridae, archaeal members of these families show a high degree of genome instability. The genome of H is highly variable due to recombination with the host as well as duplication and inversion of the so-called L-region (Reiter et al., 1988). The genome of Ch1 contains an invertible region that encodes a recombinase and structural proteins, which is reminiscent of the

55

invertible genome segments of bacteriophages like Mu or P1. As in the latter case, inversion of the segment results in variation in the structure of virion proteins, indicating that this mechanism for generating variability is shared by myoviruses across the two prokaryotic domains (Rossler et al., 2004). Several observations indicate a high level of recombination among euryarchaeal myoviruses. The HF2 genome, except for a single base change, is identical in sequence to the genome of HF1 over the first 48 kb, the region that encompasses early and intermediate genes; however, over the rest of the genomes (late gene region), the two viruses are only 87% identical, suggesting a recent recombination event between either HF1 or HF2 and another HF-like halovirus (Tang et al., 2004). Moreover, the HF2 genome appears to be a mosaic of components from widely different sources, suggesting that euryarchaeal myoviruses, like their bacterial counterparts, are vectors that shuttle genetic material over wide taxonomic distances, even across domains (Tang et al., 2002). 2.2. Spindle-shaped viruses Transmission electron microscopy (TEM) of a culture of Methanococcus voltae strain A3 revealed the presence of spindle-shaped particles with a short tail on one pointed end (Wood et al., 1989). The structure resembled those of the crenarchaeal members of the family Fuselloviridae (see below). The fusellovirses have a circular dsDNA genome, which is found also site-specifically integrated into the host chromosome. Attempts at virus induction by subjecting the host to different types of stress proved ineffective. The infectivity of these virus-like particles (VLPs) could not be demonstrated, most likely, due to a failure in host isolation. Similar virus-like particle, named PAV1, was isolated from a hyperthermophilic euryarchaeote, Pyrococcus abyssi strain GE23 (Geslin et al., 2003). Again, the infectivity of the particles, which most likely represented the family Fuselloviridae, could not be demonstrated. The halovirus His1, that has relatively flexible capsids resembling those of fuselloviruses and a small, linear DNA genome, has been originally assigned to the family Fuselloviridae (Bath and Dyall-Smith, 1998). However, subsequent analysis showed that this virus has a lytic life cycle, its genome is linear, replicates by a protein-primed DNA synthesis, and encodes a DNA polymerase. All these features fundamentally differ from those of the fuselloviruses, which led to the reclassification of His1 into the genus Salterprovirus (Dyall-Smith et al., 2003). 2.3. A putative archaeal member of the family Tectiviridae In addition to myoviruses and siphoviruses, Euryarchaeota probably support the replication of another viral family that currently includes only bacterial viruses, the Tectiviridae. The lytic virus SH1, which infects extremely halophilic archaea of the genus Haloarcula, shows gross morphological features considered to be typical of members of this family. Specifically, the polyhedral virion of SH1 has no discernible tail, a distinct proteinaceous outer layer, and a lipid layer underneath it (Bamford et al., 2005b; Dyall-Smith et al., 2003; Porter et al., 2005).

56

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

However, definitive taxonomic assignment of this virus has not been proposed. 3. Viruses of Crenarchaeota Most known dsDNA viruses that replicate in crenarchaea have morphotypes that have not previously been observed among dsDNA viruses of Euryarchaeota, Bacteria or Eukarya (Haring et al., 2005a; Rachel et al., 2002; Rice et al., 2001) although there are exceptions including the spherical PSV (Haring et al., 2004) and the larger, also spherical STIV (Rice et al., 2004) (Fig. 1). Moreover, some crenarchaeal viral morphotypes, including the rod-like structures of the rudiviruses, have been observed among eukaryal single-stranded (ss) RNA viruses. The crenarchaeal viruses that have unique virion structures are the droplet-shaped virions of the Guttaviridae (Arnold et al., 2000a), the bottle-shaped virions of the Ampullaviridae (Haring et al., 2005a), and the two-tailed virion of the Bicaudaviridae (Haring et al., 2005c) (Fig. 1). Crenarchaeal viruses so far have been classified into seven families, primarily, on the basis of their unusual or unique morphotypes, and this classification is reinforced by the genomic properties of the viruses (Table 2). Almost all of the crenarchaeal viruses are enveloped, in contrast to the large majority of the characterized bacterial and euryarchaeal viruses. Exceptions are members of the Rudiviridae whose virions consists of dsDNA complexed with multiple copies of a single 15 kDa DNA-binding protein which is reminiscent of the structure of the ssRNA-containing rod-shaped virions of the plant virions in the family Tobamoviridae (Prangishvili et al., 1999; Vestergaard et al., 2005). Interestingly, the spherical

virion of the Globuloviridae consists of an envelope encasing a helical nucleoprotein core that has a structure similar to that of the ribonucleoprotein of another family of eukaryal ssRNA viruses, the Paramyxoviridae (Haring et al., 2004). A distinct feature of the known crenarchaeal viruses is their stable relationship with the host. So far, only two crenarchaeal viruses, TTV1 and ATV, have been shown to lyse their host cells (Haring et al., 2005c; Janekovic et al., 1983). Most of the known viruses carry linear genomes, which persist at a low copy number in the cell and do not integrate into the host chromosome. The host maintains the virus stably while multiplying, suggesting that an equilibrum exists between cell division and virus replication. Moreover, for some crenarchaeal viruses, it has been demonstrated that this stability remains unaffected by stress factors, such as UV-irradiation or treatment with mytomycin C (Haring et al., 2005b). For SIRV1 and SIRV2 from the family Rudiviridae, transcription has been studied in some detail, and these experiments revealed simultaneous expression along most of the genome, with minimal temporal control (Kessler et al., 2004). Crenarchaeal viruses with circular genomes from the families Fuselloviridae and Bicaudaviridae, and the unclassified STSV1 all encode a single integrase gene. The fuselloviruses and the bicaudavirus ATV have been shown to integrate into host chromosomes, whereas STSV1 has not been detected in an integrated state (Haring et al., 2005c; Schleper et al., 1992; Xiang et al., 2005). In the case of fuselloviruses, at least, integration occurs within specific tRNA genes (Schleper et al., 1992; Wiedenheft et al., 2004). Infected cell cultures also contain cells with covalently closed circular forms of the viral genome and, at present, it is unclear whether an episomal plasmid and integrated

Fig. 1. Crenarchaeal viruses and orthologous relationships among their genes. The orthologous relationships between viral gene sets are depicted in the form of a graph, with electron-microscopic images of the respective virions in the vertices (the bars are 200 nm; the image ot TTSV1 was unavailable). The numbers at each edge show the number of inferred orthologous genes between the respective viruses. The thickness of the lines is roughly proportional to the number of orthologs. Images of ATV, ARV1, SIRV1, and PSV are from M. H¨aring, R. Rachel, and D.P., the images of SIFV and SSV1 are courtesy of Wolfram Zillig, the image of STIV is a courtesy of Mark Young, the image of STSV1 is a courtesy of Li Huang, and the image of AFV1 is modified from Bettstetter et al. (2003).

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

57

Table 2 Viruses of Crenarchaeota Family, genus, virion morphology, Species

Host

Lipids

Family Fuselloviridae Genus Fusellovirus: spindle-shaped (100 nm × 60 nm) SSV1 Sulfolobus +

SSV2 SS-K1 SSVRH

Sulfolobus Sulfolobus Sulfolobus

nd nd nd

dsDNA form, size bp

Genome integration

Sequence acc. nr

Description

ccc, 15,465

+

XO7234

ccc, 14,796 ccc, 17,385 ccc, 16,473

+ + +

AY370762 AY423772 AY388628

Martin et al. (1984), Palm et al. (1991), Schleper et al. (1992) Stedman et al. (2003) Wiedenheft et al. (2004) Wiedenheft et al. (2004)

nd

X14855 (85% of DNA)

Family Lipothrixviridae Genus Alphalipothrixvirus: non-flexible filament (38 nm × 410 nm) TTV1a Thermoproteus tenax + Linear, 15,900

Janekovic et al. (1983), Neumann et al. (1989)

Genus Betalipothrixvirus: flexible filament (1950 nm × 24 nm) with tapered ends and mob-like terminal structures SIFV Sulfolobus + Linear, 40,852 − AF440571 (96% of DNA) TTV2a T. tenax nd nd, 16,000 nd nd TTV3a T. tenax nd nd, 27,000 nd nd

Arnold et al. (2000b) Janekovic et al. (1983) Janekovic et al. (1983)

Genus Gammalipothrixvirus: flexible filament (900 nm × 24 nm) with claw-like termini that close on contact with host pili AFV1 Acidianus + Linear, 21,080 − AJ567472

Bettstetter et al. (2003)

flexible filament (1100 × 24) with terminal collar-like structure with two sets of inserted filaments Acidianus − Linear, 31,787 − AJ854042

“Deltalipotrixvirus”b :

Genus AFV2

Haring et al. (2005b)

Family Rudiviridae Genus Rudivirus: stiff rod (610–900 nm × 23 nm) with three tail fibers at each terminus SIRV1 Sulfolobus nd Linear, 32,308 − SIRV2 Sulfolobus nd Linear, 35,450 − ARV1 Acidianus nd Linear, 24,655 −

AJ414696 AJ344259 AJ875026

Prangishvili et al. (1999) Prangishvili et al. (1999) Vestergaard et al. (2005)

Family Guttaviridae Genus Guttavirus: droplet-shaped (100–185 nm × 70–95 nm) SNDVa Sulfolobus ?

nd

nd

Arnold et al. (2000a)



AJ635162

Haring et al. (2004)

nd

AY722806

Ahn et al. (2004)

nd

nd

Haring et al. (2005a)

Circular, 20,000c

“Globuloviridae”b

Family Genus “Globulovirus”b : spherical (Ø70–100 nm) with helical nucleoprotein core PSV Pyrobaculum sp., T. + Linear, 28,337 tenax TTSV1 T. tenax ? Linear, 20,933 “Ampullaviridae”b

Family Genus “Ampullavirus”b : bottle-shaped (230 × 75 [broad end]–4 [pointed end] nm) ABV Acidianus nd Linear, 23,900c

Family “Bicaudaviridae”b Genus “Bicaudavirus”b : spindle-shaped (110–180 nm × 70–100 nm) with two long tailes (total length ∼1000 nm) ATV Acidianus sp. + ccc, 62,730 + AJ888457

Haring et al. (2005c)

Unclassified isometric (Ø60 nm) STIV Sulfolobus

ccc, 17,663

nd

AY569307

Rice et al. (2004)

Unclassified spindle-shaped (230 nm × 107 nm) STSV1 Sulfolobus

ccc, 75,294



AJ783769

Xiang et al. (2005)

nd: not determined. a Viruses currently available in laboratory collections. b Taxomonic proposal is pending at the ICTV. c Approximate values.

provirus co-exist in the same cell. Another distinguishing feature of the crenarchaeal viruses with circular genomes is that their replication can be induced by stress factors like UVirradiation or mytomycin C and, for the bicaudavirus ATV, even by lowering the temperature below that of the optimal temperature of host growth (Haring et al., 2005c). Induction of ATV replication eventually leads to cellular lysis (Haring et

al., 2005c); in contrast, in the case of the fuselloviruses, induction is not deleterious to the cell and, after a burst of viral replication, the amount of virus returns to its original level (Martin et al., 1984). It remains unclear whether induction is coupled to the excision of the integrated forms of the viral genome. Transcription analysis following induction in SSV1 lysogens revealed a uniform and continuous transcription pattern

58

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

with the induction of only one small transcript (Reiter et al., 1987). The two DNA strands of the linear genomes of the rudiviruses are covalently linked in a terminal hairpin (Blum et al., 2001; Peng et al., 2004); a similar terminal structure may also occur in the globulovirus PSV (Haring et al., 2004). The terminal structures of the linear viral genomes of the families Lipothrixviridae and Ampullaviridae are different. Two of these viruses, TTV1 and AFV1, have been shown to produce low molar yields of terminal restriction fragments after phenol extraction of a genomic restriction digest (Bettstetter et al., 2003; Janekovic et al., 1983). This suggests that the termini are hydrophobic, possibly reflecting the presence of proteins tightly, perhaps covalently, bound to the ends of the genomic DNA. Typically, the linear genomes of crenarchaeal viruses carry inverted terminal repeats. These repeats differ in length in the range of 6–7% of the genome length in the rudiviruses to only 11 bp in the lipothrixvirus AFV1 (Bettstetter et al., 2003; Peng et al., 2001). However, in the AFV1 genome, ∼300 bp at each end carry multiple direct repeats of the pentanucleotide TTGTT, and close variants thereof, on opposite strands. In the lipothrixvirus AFV2, a long stretch of non-coding sequence with short, imperfect, direct repeats occurs in the central part of the genome; these repeats might be involved in internal initiation of genome replication (Haring et al., 2005b). A mechanism for genome replication has been proposed only for the rudiviruses. Replication is thought to initiate with the introduction of a single-strand nick at position 11 from the covalently closed termini, yielding a free 3 -hydroxyl group for priming (Blum et al., 2001). Adjacent to this nicking site is a virus-specific tetranucleotide signature sequence that may be recognised by the nicking enzyme (Peng et al., 2004). It has been proposed that unpairing of one strand then precedes elongation of the 3 -termini of the other. On reaching the end of the template, the elongated strand folds back on itself, and the remainder of the genome is copied. Head-to-head or tail-to-tail concatemeric linked intermediates are formed (Peng et al., 2001) and are probably processed into single progeny genomes by the virus-encoded Holliday junction resolvase (Birkenbihl et al., 2001). This model is similar to that proposed for some eukaryal cytoplasmic DNA viruses, including the Poxviridae (DeLange et al., 1986), and was originally developed on the basis of similarities in genome organisation and in the types of replicative intermediates formed by these viruses (Peng et al., 2001). 4. Comparative genomics of archaeal viruses 4.1. A small pool of conserved genes shared by otherwise unique genomes revealed by detailed protein sequence analysis As outlined above, most of the identified viruses of the Crenarchaeota have highly unusual morphotypes, and initial analyses of their genomes revealed very few genes coding for proteins homologous to any sequences in the existing sequence databases, be it proteins of other viruses or those of cellular life forms. However, it is well known that viral proteins tend

to change at a fast pace, often obliterating sequence conservation beyond subtle motifs; besides, the databases themselves change rapidly, with new sequences adding diversity. Therefore, we decided to systematically re-examine the protein sequences of the mysterious crenarchaeal viruses by using iterative search with PSI-BLAST (Altschul and Koonin, 1998; Altschul et al., 1997) and search for conserved domains implemented in the CDD (Marchler-Bauer and Bryant, 2004) and SMART (Letunic et al., 2004) systems. The results of all these searches were examined case by case manually, in an attempt to identify sequence similarities that may be only marginally statistically significant but nevertheless are likely to reflect bona fide homologous relationships (Aravind and Koonin, 1999b). For comparative purposes, the same strategy was applied to the re-analysis of the available genome sequences of euryarchaeal viruses. The following master set of crenarchaeal viruses was employed for this analysis: Acidianus filamentous virus 1 (AFV1), Acidianus rod-shaped virus 1 (ARV1), Acidianus two-tailed virus (ATV), Pyrobaculum spherical virus (PSV), Sulfolobus islandicus filamentous virus (SIFV), Sulfolobus islandicus rod-shaped virus 1 (SIRV1), Sulfolobus shibatae virus 1 (SSV1), Sulfolobus turreted icosahedral virus (STIV), Sulfolobus tengchongensis spindle-shaped virus 1 (STSV1), and Thermoproteus tenax spherical virus 1 (TTSV1). The results of sequence analysis of virus isolates closely related to the viruses in this master set are not included but are mentioned in the text where significant differences were observed. The results of this analysis are mixed: although we did manage to uncover a considerable number of previously unnoticed evolutionary connections of archaeal viral proteins and, accordingly, made a variety of new functional predictions (Tables 1S and 2S in Supplementary material and see below), the genomescapes of these viruses remained barren, with only a small fraction of genes coding for proteins that are conserved in other viruses, cellular life forms or both (Fig. 2). Crenarchaeal viruses substantially differ in this respect: SIRV, SIFV, and STSV1 have a considerable fraction of evolutionarily conserved genes, whereas the gene repertoires of PSV and, especially, TTSV1 remain almost entirely mysterious (Fig. 2 and Table 1S). Somewhat unexpectedly, a similar breakdown of the genes of euryarchaeal viruses (that are often considered almost garden variety phages) gives comparable results (Fig. 2) although the evolutionary affinities and predictable functions of conserved genes differ dramatically (see below). Some of the short open reading frames (ORFs) that are currently annotated in archaeal viral genomes may be over-prediction artefacts (in the course of this analysis we did not attempt a complete reannotation) but most of the ORFs are conserved between closely related genomes (e.g., SSV1 and SSV2 or SIRV1 and SIRV2; data not shown) indicating that they are bona fide protein-coding genes. Thus, the conclusion is undeniable that, at least with the current coverage of viral and cellular genomes, a considerable majority of archaeal viral genes have no detectable homologs (other than in closely related viral isolates). This paucity of genes with detectable homologs is not unlike the situation in complex eukaryotic viruses (e.g., phycodnaviruses or the mimivirus, Iyer et al., 2006) and is, probably, explained by rapid evolution of

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

59

Fig. 2. Breakdown of the gene sets of the Crenarachaeal and Euryarchaeal viruses by evolutionary affinities. Viral + Cell, genes with homologs in other viruses and cellular life forms; Viral only, genes with homologs detectable only in other viruses; Cellular, genes with homologs detectable only in cellular life forms; Unique, genes without detectable homologs. Homologs in closely related viruses (including the pairs ARV1-SIRV1 and PSV-TTSV1) were disregarded.

many viral proteins as they adapt to perform virus-specific functions. Probably, the most notable result of this analysis is the delineation of a set of 15 proteins or protein families that are shared by overlapping subsets of crenarchaeal viruses (Table 3). Here, we deliberately take an inclusive view of homology by lumping together genuine orthologs, i.e., those proteins that were most closely related within the analyzed set of crenarchaeal viruses (Koonin, 2005), compared to homologs from all other sources, and more distantly related members of the same family

that might have been acquired by viruses via different routes. This approach is taken in order to obtain a complete picture of the shared portions of crenarchaeal gene repertoires, with the ensuing functional implications. The orthologous relationships, which are criticial for inferring the evolutionary connections between viruses as distinct genetic entities, are summarized in the next section. Here we discuss the most prominent functional groups of proteins shared by crenarchaeal viruses as well as some of the unique functions found in individual viruses.

Table 3 Genes and gene families shared by different viruses of Crenarchaeotaa Protein (family)/virus

AFV1

SIFV

SIRV

RHH domains, predicted transcription regulators HTH domains, predicted transcription regulatorsb Looped-hinge helix (AbrB/SpoVT-like) domains, predicted transcription regulatorsc C2H2 Zn-finger proteins, possible transcription regulators Queuine/archaeosine tRNA-ribosyltransferase Predicted SAM-dependent methyltransferase, possibly, involved in RNA modification RecB family endonucleased dUTPase Flavin-dependent thymidylate synthase AAA+ ATPase homologous to the ATPase domain of Lon protease AAA+ ATPase homologous to DnaA (probable role in replication initiation) ATPase of the FtsK-HerA superfamily, possibly, involved in DNA packaginge XerC/D-like integrase/recombinase Glycosyltransferase Uncharacterized YddF family Uncharacterized, crenarchaeal-virus-specific family (e.g., AFV1p03)

2 0 0

1 1 1

2 2 0

1 0 0

0 0 1

1 0 0 1 0

ARV1

SSV1

STSV1

STIV

ATV

PSV

TTSV

1 1 0

2 1 0

6 0 1

2 2 0

2 0 0

0 3 0

0 0 0

1 2 1

0 0 1

3 0 0

0 1 0

2 0 0

1 0 0

0 0 0

0 0 0

1 0 0 0 0

1 1 0 1 0

1 0 1 1 0

1 0 0 0 1

0 1 1 0 1

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0

0

0

0

0

0

1

0

1

1

0 2 0 1

0 5 1 1

0 3 1 >4

0 3 1 >4

1 0 0 0

1 1 0 0

0 0 1 0

0 0 1 0

0 0 0 0

0 0 0 0

a The numbers in the table indicate the detected number of representatives of the respective protein family encoded in the respective viral genome; the accession numbers of the corresponding genes and additional information are given in the Supplementary Table 1S. b The HTH-domain proteins in all viruses seem to have originated independently. c The AbrB/SpoVT-like proteins of SIFV and STSV1 obviously have different origins. d The RecB-family endonucleases of AFV, SIRV and SSV are obviously orthologous; however, SIFV has a distinct form that might have been acquired independently from the archaeal host. e The FtsK-HerA family ATPases of PSV and TTSV are orthologs; however, the ATPase of this family encoded by STIV is distinct and probably was acquired independently.

60

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

4.2. Predicted transcription regulators: the ribbon-helix-helix domain Somewhat unexpectedly, it turns out that the most common gene products in crenarchaeal viruses are small proteins containing the ribbon-helix-helix (RHH) domain. With the exception of PSV and TTSV1, all sequenced crenarchaeal viral genomes encode at least one RHH-domain protein, and STSV1 has a notable expansion of six paralogs (Table 3). Some of the RHH domains are not easy to recognize by sequence analysis owing to their small size (∼50 amino acids) and limited sequence conservation; therefore identification of the full complement of these domains among viral proteins required careful inspection of the results of PSI-BLAST and CDD searches. The RHH domain may be considered a distinct, highly derived version of the classic helix-turn-helix (HTH) DNA-binding domain (Aravind et al., 2005). Typically, the RHH-domain proteins are transcription regulators, the best characterized ones being the methionine repressor MetJ (Somers and Phillips, 1992), the bacteriophage P22 Arc repressor (Cordes et al., 1999; Raumann et al., 1994), and the plasmid-encoded repressor CopG (Gomis-Ruth et al., 1998). The RHH domain consists of a ␤-strand and two ␣-helices and typically forms dimers, with the helices involved in dimerization and the strands, which together form a ␤-ribbon, recognizing the target sequences in DNA by inserting into the major groove (Gomis-Ruth et al., 1998; Suzuki, 1995). In retrospect, the prevalence of the RHH domains in crenarchaeal viruses may not be particularly surprising given that small RHH-domain proteins are common in archaea, being nearly as abundant as typical HTH domains (Aravind and Koonin, 1999a; Perez-Rueda et al., 2004). It is notable, however, that no RHH-domain proteins were detected in the available genomes of euryarchaeal viruses (Table 2S), suggesting that these small and compact proteins are particularly apt for transcription regulation in hyperthermophiles. Fig. 3a shows the alignment of all RHH-domain proteins detected in crenarchaeal viruses along with several archaeal and bacterial homologs. Although there are no invariant residues in this alignment, the regions corresponding to the strand and the two helices of the RHH domain show considerable conservation of amino acid properties, including the key hydrophobic residues involved in protein-DNA contacts (Fig. 3a); this supports the prediction that these viral proteins are functional transcription regulators. The RHH proteins of the crenarchaeal viruses are a heterogeneous lot. One well-defined set of highly conserved orthologs (represented, e.g., by AFV1 p08) consists of typical RHH domains with significant similarity to a variety of archaeal and bacterial homologs (Fig. 3a). AFV1 and ATV have tandem genes coding for closely related proteins of this group, clearly, the result of relatively recent duplications. In contrast, the family of paralogous RHH-domain proteins in STSV1 includes highly derived forms of this domain, with very weak similarity to RHHproteins from other viruses and prokaryotes; a novel feature of some of these STSV1 proteins is a duplication of the RHH domain (Fig. 3a). The genes encoding these proteins (with one exception) form a tandem array in the STSV1 genome, which

indicates their origin by intra-genomic duplication. In addition, SIRV and SIFV encode a distinct version of the RHH domain that is related to the plasmid partitioning proteins ParG/B. These proteins show minimal sequence similarity to other RHH domains but have the same mode of interaction with DNA and might also act as transcriptional represssors (Golovanov et al., 2003). STIV, on the other hand, in addition to a member of the conserved viral RHH orthologous set, encodes another RHH-domain protein distantly related to the other RHH-domains of crenarchaeal viruses but showing significant similarity to some archaeal RHH-proteins (Table 1S). Thus, the RHH-domains of crenarchaeal viruses obviously have a complex history that, probably, involved multiple independent acquisitions and horizontal gene mobility (see also discussion below). 4.3. Other transcriptional regulators: HTH-domains, looped-hinge helix domains, Zn-fingers In addition to the most prevalent RHH domains, crenarchaeal viruses encode a considerable number of other predicted transcriptional regulators. The majority of viruses have at least one protein containing an HTH domain but none of these genes appear to be orthologs, suggesting multiple, independent routes of acquisition. For example, PSV encodes three closely related paralogous HTH-domain proteins whose closest homologs are predicted transcription regulators from Pyrobaculum aerophilum, the host of this virus (Table 1S and data not shown). There are no counterparts to this protein family in the related virus TTSV1. In this case, the evolutionary scenario appears clear and consists of acquisition of a host gene with two subsequent duplications within the PSV genome. In contrast, the two HTH-domain proteins of STIV show no specific similarity to each other or HTH-domain proteins from Crenarchaeota; they are distantly related to a variety of bacterial transcriptional regulators such that their exact origins is impossible to infer; SIFV also encodes a pair of HTH-domain proteins of uncertain provenance (Table 2S). Two crenarchaeal viruses, STSV1 and SIFV, encode members of another class of prokaryotic transcription regulators, the SpoVT/AbrB-like proteins that have the so-called loopedhinge helix fold, a variation of the ␤-barrel (Coles et al., 2005; Huffman and Brennan, 2002). The provenance of the loopedhinge helix proteins in the two viruses is obviously different (Table 1S). The STSV1 protein, which contains a duplication of the looped-hinge helix domain, is closely related to several predicted transcription regulators from Sulfolobus and is a relatively recent acquisition from the host. In contrast, the SIFV protein has no closely related homologs and is only remotely similar to several bacterial proteins of this class, leaving the origin of the viral gene unclear. An interesting feature of several crenarchaeal viral genomes is the presence of C2H2 Zn-finger proteins with moderate similarity to a variety of eukaryotic Zn-fingers but no obvious homologs in prokaryotes (Fig. 3b). The C2H2 finger proteins of AFV, SSV, and ATV are obviously orthologous but STIV and SIRV each have a distinct protein of this class that is more closely related to eukaryotic fingers than to those of other crenarchaeal

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

61

Fig. 3. Multiple alignments of predicted transcriptional regulators shared by multiple Crenarchaeal viruses. (a) RHH domains. Each protein is denoted by the virus abbreviation and the RefSeq identification number; multiple copies of the RHH domain in STIV proteins are deisgnated 1 through 4 from the N-terminus. The range of aligned amino acid positions in each protein is indicated by numbers in front of the sequence. Asterisks indicate the end of the respective sequences. Below the alignment, the secondary structure elements derived from the crystal structure of CopG (Gomis-Ruth et al., 1998) are shown; the arrow indicates the ␤-strand and the cylinders indicate the ␣-helices. The conserved hydrophobic residues are shown by bold type and shading, and the characteristic samml residue in the turn between the two helices is shown by reverse type. To the right of the alignment, three distinct groups of crenarchaeal viral RHH-domain proteins are indicated: 1, orthologous set of small proteins from AFV1, ATV, SIRV1, and STIV (with duplications in AFV1 and ATV); 2, paralogous STSV1 proteins, typically, with multiple RHH domains; 3, ParB-like RHH domain of SIFV and SIRV1. The sequences of archaeal and bacterial RHH domains shown for comparison are separated from the crenarchaeal viral sequences by a blank line. Additional abbreviations: Hmo, Haloarcula morismortii, Pal, Pseudomonas alcaligenes, Sag, Streptococcus agalacticus, Sso, Sulfolobus solfataricus. (b) C2H2 Zn-fingers. The conserved cysteines and histidines are shown by reverse type. The rest of the designations are the same as in (a). The alignments were constructed using the MACAW program (Schuler et al., 1991).

viruses (Fig. 3b and Table 1S). Interestingly, PSV, which does not have C2H2 Zn-fingers, instead encodes two C2C2 Zn-finger proteins, also distantly related only to some eukaryotic proteins. Involvement in transcription regulation is the most likely function of viral Zn-finger proteins but their origin remains mysterious. Conceivably, the genes for these proteins are highly mobile elements of archaeal genomes that are carried by viruses but remain to be identified in a cellular archaeal genome. The finding of these unusual proteins in crenarchaeal viral genomes

may have implications for the origin of eukaryotic Zn-fingers, which are among the most common DNA-binding modules in eukaryotes. 4.4. P-loop ATPases implicated in replication, packaging and other functions The P-loop ATP/GTPase domain is the most abundant protein domain in prokaryotes (Wolf et al., 1999) and is also encoded by

62

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

the great majority of viruses including those with small genomes (Gorbalenya and Koonin, 1989). Typically, viral P-loop proteins are nucleic-acid-stimulated ATPases (including helicases) that are involved in viral replication, transcription or packaging. The crenarchaeal viruses are no exception in that each of them encodes at least one P-loop ATPase (Tables 3 and 4); again, however, the relationships between these proteins are complex, suggesting multiple evolutionary scenarios. Thus, AFV, SIRV and ARV1 encode orthologous ATPases related to the ATPase domains of the bacterial and archaeal Lon proteases, whereas SSV and STSV1 have an orthologous pair of ATPases that belong to the same AAA+ class (Iyer et al., 2004a) but show a completely different affinity within it, i.e., are specifically related to bacterial DnaA, the ATPase involved in replication initiation. Furthermore, ATV encodes yet a third AAA+ ATPase, one moderately similar to CDC48, a crucial molecular chaperone of archaea and eukaryotes (Elsasser and Finley, 2005). Three viruses, PSV, TTSV1, and STIV, encode ATPases of the FtsK-HerA superfamily which consists of proteins implicated in DNA-pumping into the daughter cells in bacteria and archaea, and viral DNA packaging (Iyer et al., 2004b); while the PSV

and TTSV1 proteins are obvious orthologs, the one from STIV is only distantly related to them and does not seem to have the same origin. Finally, SIFV encodes no ATPases related to those of other viruses but instead has two distinct superfamily II helicases, both of apparent archaeal origin, and ARV1 has an ABC-class ATPase that is absent in the related SIRV genome and might have been acquired form archaea relatively recently (Table 4). The repertoire of P-loop ATPases found in Crenarchaeal viruses dramatically differs from that of Euryarchaeal viruses (Table 2S). None of the sequenced euryarchaeal viral genomes encodes an ATPase of the AAA+ class, and only the haloviruses have any ATPase (a helicase) other than the large subunit of the phage-like terminase. 4.5. Other proteins implicated in replication, DNA precursor metabolism, and RNA modification Other predicted enzymes of crenarchaeal viruses with probable functions in DNA replication include the RecB-family endonuclease, XerC/D-like integrase, archaeal-type Holliday

Table 4 Unique genes of Crenarchaeal viruses with homologs in archaea and/or bacteriaa Gene

Length (# amino acids)

Homologs

Predicted function

ATV ATV ORF330 ATV ORF529

330 529

Membrane-associated acyltransferase ATPase involved in initiation of replication

ATV ORF241

241

ATV ORF545

545

ATV ORF209

209

Numerous bacterial and some eukaryotic homologs CDC48-like AAA ATPase, equally similar to archaeal and bacterial homologs, no close homologs in archaeal viruses Homologs in Methanococcus, Sulfolobus, Thermococcus, more distant in many bacteria Primarily archaeal and more distant bacterial homologs Distant homologs in bacteria

Adenine-specific DNA methylase Cytosine-specific DNA methylase similar to PspGI Predicted RNA-binding protein

Integrase/recombinase Multitransmembrane protein Metal-dependent protease of the PAD1/JAB1 superfamily

STSV1 YP 077205.1 YP 077254.1 YP 077212.1

700 412 187

YP 077259.1 YP 077212.1

297 187

YP 077259.1 YP 077251.1

297 142

YP 077243.1

280

Orthologs in most archaea, thermophilic bacteria Orthologs in many bacteria, some archaea Homologous to the RNA-binding PUA domain of archaeal archaeosine tRNA-ribosyltransferase Homologs in many archaea and bacteria Homologous to the RNA-binding PUA domain of archaeal archaeosine tRNA-ribosyltransferase Homologs in many archaea and bacteria Distant similarity to a variety of bacterial acetyltransferases Distant similarity to bacterial ParB-like nucleases

YP 077258.1

320

Homologs in many bacteria and archaea

SIFV NP 445672.1 NP 445687.1 NP 666548.1

601 559 399

Ski2-like SF2 helicase Rad25-like SF2 helicase Permease, multitransmembrane protein

NP 666615.1

121

Orthologs in archaea and bacteria Orthologs in all archaea, eukaryotes Ortholog in S. tokodaii, more distant orthologs in other archaea and bacteria Orthologs in all archaea

ARV1 ARV1 ORF210

210

Homologs in S. solfataricus and more distant homologs in all archaea and bacteria

ABC-class ATPase

a

Cytosine-specific DNA modification methylase Predicted RNA-binding protein Cytosine-specific DNA modification methylase Acetyltransferase Nuclease, potentially, specific for supercoiled DNA; 5 -3 exonucleae activity also possible Nucleoside-diphosphate-sugar epimerase

Archaeal-type Holliday junction resolvase

Several functionally uncharacterized genes with archaeal homologs were omitted (see Supplementary Table 1S).

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

junction resolvase, and some other nucleases (Tables 3 and 4). All of these enzymes may be involved in intermediate resolution during viral genome replication. It seems likely that the RecB endonuclease is the nicking enzyme that initiates the replication of the SIRV genome by cleaving the terminal hairpin (Blum et al., 2001). In addition, STSV1 encodes three modification methylases that probably methylate host DNA. Two enzymes, dUTPase and flavin-dependent thymidylate synthase (ThyX), are involved in DNA precursor metabolism, the class of functions that is widely represented in bacteriophages and eukaryotic viruses with large DNA genomes (Iyer et al., 2006). Interestingly, ThyX is present in ARV1 but not in the related SIRV, and the ThyX sequence of ARV1 is closely related to those of STSV1 and Sulfolobus. This suggests a relatively recent acquisition of this enzyme from the host, most likely, independently by ARV1 and STSV1. SIRV and STSV1 encode queuine/archaeosine tRNAribosyltransferase, a tRNA modification enzyme present in all archaea, and SIFV, SIRV and ARV1 have a predicted Sadenosylmethionine-dependent methyltransferase that, judging by similarity to various archaeal and bacterial methyltransferases, is likely to be an RNA methylase. As with enzymes of DNA precursor metabolism, the presence of RNA modification enzymes is not unique to crenarchaeal viruses: some enzymes of this functional class are encoded by bacteriophages with large genomes, such as T-even phages (Miller et al., 2003). 4.6. Enzymes implicated in virion morphogenesis and modification of the host cell wall Several crenarchaeal viruses encode diverse glycosyltransferases (Tables 3 and 1S) that may be involved in modification of virion proteins and/or the host cell wall during viral entry and/or release (Markine-Goriaynoff et al., 2004). In addition, some of the individual viruses encode enzymes that may be implicated in the same processes, such as a membrane-associated acyltransferase (ATV) and nucleoside-diphosphate-sugar epimerase (STSV) (Table 4). ATV encodes an interesting protein in this functional category, a predicted JAMM-family metalloprotease (Table 4). The proteins of this family are nearly ubiquitous in prokaryotes although their functions are poorly understood except for involvement in murein metabolism in proteobacteria; additionally, they have been recruited as minor tail components in lamboid bacteriophages and as proteasomal deubiquitinating enzymes in eukaryotes (Aravind and Ponting, 1998; Verma et al., 2002). An intriguing possibility is that this predicted protease might be involved in the unique tail morphogenesis process observed in ATV (Haring et al., 2005c). 5. Orthologous genes and monophyly versus polyphyly in the evolution of Crenarchaeal viruses To assess the relationships between viruses as distinct, autonomous entities, it is necessary to delineate the sets of orthologous genes they share. We obtained a preliminary census of orthologous genes among Crenarchaeal viruses by examining the results of sequence similarity searches using the BLAST-

63

CLUST program (ftp://ftp.ncbi.nih.gov/blast/documents/ blastclust.html) as well as manually (a definitive identification of orthologs requires comprehensive phylogenetic analysis that may be extremely error-prone in the case of viruses and was not undertaken in the course of the present work). By definition, orthologs are genes that derive from the same ancestral gene in the genome of the last common ancestor of the compared organisms (Fitch, 1970; Koonin, 2005). However, in the case of viruses, the notion of orthology is confounded by the uncertainty regarding the very existence of such a common ancestor as a unique virus, even if it is obvious that the compared extant viruses share a gene pool. Therefore, orthologs can be defined only conditionally, as genes that are more closely related to each other in a given set of viruses than they are to any homologs that may exist outside that set of genomes; of course, the most conspicuous cases are those where such homologs are undetectable. The inferred numbers of such “conditional orthologs” between all pairs of sequenced Crenarchaeal viral genomes are shown in Fig. 1. The results of these comparisons reveal four tiers of relationships (apart from the closely related virus isolates like SIRV1 and 2): (i) two pairs of viruses that each share a substantial fraction of orthologous genes and obviously derive from, perhaps, relatively recent common ancestors: SIRV-ARV1 and PSV-TTSV1; (ii) viruses that represent three distinct, moderately related groups with 6–9 orthologous genes: SIRV/ARV-AFV-SIFV; these are likely to have evolved from the same ancestral virus in a more distant past; (iii) the rest of the viruses that share one to four orthologs; in these cases, it remains unclear whether a common ancestral virus ever existed or the shared genes result from HGT between the viruses or, in some cases, independent acquisitions from the hosts; (iv) PSV-TTSV1 versus the rest of the crenarchaeal viruses—no orthologous genes detected. The most prominent clusters of apparent orthologs in crenarchaeal viruses are one of the RHH-domain proteins (exemplified by AFV1 p08; Fig. 3a), which is represented in AFV1, SIRV, SSV, and ATV, and a family of small proteins homologous to the uncharacterized Bacillus subtilis protein YddF that is present in STIV, SIRV, SIFV, AFV1/2, and ATV. The RHH proteins actually could be, at least in part, pseudo-orthologs (Koonin, 2005) because they are highly similar to the homologs from Sulfolobus and might have been acquired independently by at least some of the viruses. The evolution of the YddF family, as well as its function, is more mysterious as it has a single bacterial representative, with all other members found in crenarchaeal viruses. At face value, this seems to be the strongest case for HGT between otherwise unrelated crenarchaeal viruses like STIV and the SIRV/ARV-SIFV-AFV group, and also between crenarchaeal viruses and bacteria. In general, however, the consistency of orthologous relationships between the crenarchaeal viruses is low such that the connections within the SIRV/ARVSIFV-AFV group are formed by largely non-overlapping sets of orthologs; moreover, SIRV and ARV share several orthologous genes with AFV2 but not AFV1 (Table 1S) emphasizing the plasiticity of the crenarchaeal viral genomes. This type of “transitive” relationships between viral genomes strikingly resembles the relationships between genomes of tailed bacteriophages

64

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

(Casjens, 2003, 2005). The only signature of this group of crenarchaeal viruses (not counting the two aforementioned families with a wider distribution) is a distinct family of glycosyltransferases (typified by SIFV0046; Table 1S). These observations are compatible with the idea that gene sampling from a shared pool contributed to the evolution of crenarchaeal viruses. 6. General discussion and conclusions: unique and generic aspects of archaeal virus evolution The comparative-genomic analysis described here, along with previous studies, revealed a strange world of crenarchaeal viruses. These viruses share a small pool of genes among themselves and more genes with their hosts. When a conserved core of orthologous genes shared by distinct viruses, like the SIRV/ARV-AFV-SIFV cluster, is apparent, the conclusion on common ancestry of viruses themselves, as genetic entities, seems to be justified. More commonly, however, viruses share a very small number of genes, which is best compatible with the HGT scenario, i.e., mixing and matching of genes within the viral pool, with occasional exchange with the hosts as well. Some crenarchaeal viruses like PSV and TTSV1 share no orthologous genes with other viruses; their genomes are almost a terra incognita in which even sequence analysis pushed to the limit, as described here, revealed only a few genes with predictable functions but uncertain provenance. Beyond the presence of several common, widespread domains like HTH and the P-loops ATPase (and, to a lesser extent, RHH), crenarchaeal viruses share virtually no homologous genes with viruses from other prokaryotic divisions including those viruses of euryarchaea for which genome sequences are available. There are very few exceptions, such as the flavin-dependent thymidylate synthase, which is encoded in the genomes of some crenarchaeal, euryarchaeal, and bacterial viruses, but the respective viral proteins do not seem to be specifically related, making independent acquisition from the hosts the most likely evolutionary scenario. As mentioned above, the SIRV genomic DNA has terminal closed hairpin structures resembling those of animal poxviruses which prompted the discussion of the possibility of a common origin for these viruses (Blum et al., 2001; Peng et al., 2001; Prangishvili, 2003). However, given the clear evolutionary relationships between SIRV and other crenarchaeal viruses that have distinct terminal structure, such as AFV and SIFV, and the lack of any homologous relationship between SIRV genes and those of eukaryotic viruses, such an evolutionary connection does not seem to be tenable. Apparently, the similarity between the terminal structures in the genomes of euryarchaeal and eukaryotic viruses is due to convergence and might reflect similarities in the replication mechanisms. While several crenarchaeal viruses have unique morphotypes, STIV has an icosahedral capsid with a striking resemblance to the capsids of certain eukaryotic viruses (adenoviruses, phycodnaviruses) and bacteriophages, e.g., PRD1 (Hendrix, 2004; Rice et al., 2004). Based on this similarity and the compatibility of the predicted secondary structure of the major capsid protein of STIV and the experimentally determined atomic struc-

tures of the major coat protein of PRD1 and adenovirus hexon, it has been proposed that the capsids of all these viruses share a common origin (Rice et al., 2004). Indeed, this prediction has been fully born out by the recent determination of the crystal structure of the major capsid protein of STIV which turned to be highly similar not only to the capsid proteins of icosahedral DNA viruses but also to those of RNA viruses, such as cowpea mosaic virus (Khayat et al., 2005). These findings leave no doubt that the capsid protein of STIV is homologous to the jelly-roll capsid proteins that are nearly universal among icosahedral viruses from all walks of life (Bamford et al., 2005a; Nandhagopal et al., 2002). These remarkable findings, along with the presence of a gene for a FtsK-HerA-family packaging ATPase in some of the crenarchaeal viruses, expand the notion of the common pool of viral genes reaching across domain boundaries, as already evidenced by the considerable number of genes shared by bacteriophages and eukaryotic viruses, although the contribution of crenarchaeal viruses to this pool so far seems to be miniscule (Iyer et al., 2006). On the whole, comparative-genomic analysis clearly indicates that crenarchaeal viruses, as distinct replicating entities, are unrelated to any other viruses and have a unique origin or, more likely, multiple origins. The results seem to be compatible with assembly of the viral genomes from genes scavenged from the hosts, with some acquisitions being relatively recent as suggested by high level of sequence conservation and others being ancient such that little or no evidence of cellular ancestry is retained in the viral protein sequences. Alternatively, it is conceivable that the genes perceived as ancient acquisitions from the host might descend directly from the primordial pool of genetic elements that predated the emergence of archaea and bacteria as distinct cellular entities (Koonin and Martin, 2005). This emerging picture of crenarchaeal virus evolution is quite different from what can be deciphered from the analysis of the genomes of viruses infecting mesophilic and moderately thermophilic Euryarchaea in that the euryarchaeal viruses, such as Chi1 or HF1/2, have multiple structural proteins clearly related to those of bacteriophages and so belong to the bacteriophage gene pool (Table 2S). The distinct evolutionary histories of crenarchaeal and euryarchaeal viruses might be explained either by the biological differences between the two archaeal divisions or by a distinct spectrum of selective pressures and relative genetic isolation imposed by the hyperthermophilic lifestyle of the hosts of the crenarchaeal viruses sequenced to date. On general grounds, the latter explanation seems more plausible; the issue will be resolved when genome sequences of viruses from hyperthermophilic Euryarchaea and/or mesophilic Crenarchaea become available. Given that viruses encode, to a large extent, proteins involved in genetic processes and taking into consideration the wellestablished evolutionary relationship between the information processing systems of archaea and eukaryotes, one might expect that archaea and eukaryotes would harbor related viruses. Furthermore, archaeal viruses might have been implicated as ancestors of at least some groups of eukaryotic viruses. As shown here, this is patently not so, and at least crenarchaeal viruses are not related to any other known viruses. However, although unrelated

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

evolutionarily, the genomescapes of crenarchaeal viruses, in their general features, are not unlike those of temperate bacteriophages. Indeed, a substantial fraction of encoded proteins with recognizable domains and predictable functions are implicated in the regulation of viral gene expression, and only one or two genes per genome, typically, ATPases and, in some cases, nucleases appear to be involved in genome replication and encapsidation. This limited resemblance to bacteriophages in terms of encoded functions appears to be due to independent, convergent accretion of functionally similar genes. Acknowledgements We thank W. Zillig, M. Young, and L. Huang for providing images of viruses used in Fig. 1. This work was supported by the Agence Nationale de la Recherche (France), Programme Blanc, and the Intramural Research Program of the NIH, National Library of Medicine. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.virusres.2006.01.007. References Ackermann, H.W., 1998. Tailed bacteriophages: the order caudovirales. Adv. Virus Res. 51, 135–201. Ackermann, H.W., 2003. Bacteriophage observations and evolution. Res. Microbiol. 154, 245–251. Ahn, D.-G., Kim, S.-I., Rhee, J.-K., Kim, K.P., Oh, J.-W. (2004). Ttsv1, a novel globuloviridae family virus isolated from the hyperthermophilic crenarchaeote Thermoproteus tenax. GenBank Accession: NC 006556. Altschul, S.F., Koonin, E.V., 1998. PSI-BLAST—a tool for making discoveries in sequence databases. Trends Biochem. Sci. 23, 444–447. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. Aravind, L., Anantharaman, V., Balaji, S., Babu, M.M., Iyer, L.M., 2005. The many faces of the helix-turn-helix domain: transcription regulation and beyond. FEMS Microbiol. Rev. 29, 231–262. Aravind, L., Koonin, E.V., 1999a. DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic Acids Res. 27, 4658–4670. Aravind, L., Koonin, E.V., 1999b. Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches [in process citation]. J. Mol. Biol. 287, 1023–1040. Aravind, L., Ponting, C.P., 1998. Homologues of 26S proteasome subunits are regulators of transcription and translation. Protein Sci. 7, 1250–1254. Arnold, H.P., Ziese, U., Zillig, W., 2000a. SNDV, a novel virus of the extremely thermophilic and acidophilic archaeon Sulfolobus. Virology 272, 409–416. Arnold, H.P., Zillig, W., Ziese, U., Holz, I., Crosby, M., Utterback, T., Weidmann, J.F., Kristjanson, J.K., Klenk, H.P., Nelson, K.E., Fraser, C.M., 2000b. A novel lipothrixvirus, SIFV, of the extremely thermophilic crenarchaeon Sulfolobus. Virology 267, 252–266. Bamford, D.H., 2003. Do viruses form lineages across different domains of life? Res. Microbiol. 154, 231–236. Bamford, D.H., Grimes, J.M., Stuart, D.I., 2005a. What does structure tell us about virus evolution? Curr. Opin. Struct. Biol. 15, 655–663. Bamford, D.H., Ravanti, J.J., R¨onnholm, G., Laurinaviˇcius, S., Kukkaro, P., Dyall-Smith, M., Somerharju, P., Kalkkinen, N., Bamford, J.K.H., 2005b. Constituents of SH1, a novel lipid-containing virus infecting the halophilic euryarchaeon Haloarchula hispanica. J. Virol. 79, 9097–9107.

65

Bath, C., Dyall-Smith, M.L., 1998. His1, an archaeal virus of the Fuselloviridae family that infects Haloarcula hispanica. J. Virol. 72, 9392–9395. Bertani, G., Baresi, L., 1986. Looking for gene transfer mechanisms in methanogenic bacteria. In: Kandler, O., Zillig, W. (Eds.), Archaebacteria ’85. Gustav Fisher Verlag, Stuttgart. Bettstetter, M., Peng, X., Garrett, R.A., Prangishvili, D., 2003. AFV1, a novel virus infecting hyperthermophilic archaea of the genus acidianus. Virology 315, 68–79. Birkenbihl, R.P., Neef, K., Prangishvili, D., Kemper, B., 2001. Holliday junction resolving enzymes of archaeal viruses SIRV1 and SIRV2. J. Mol. Biol. 309, 1067–1076. Black, L.W., 1989. DNA packaging in dsDNA bacteriophages. Annu. Rev. Microbiol. 43, 267–292. Blum, H., Zillig, W., Mallok, S., Domdey, H., Prangishvili, D., 2001. The genome of the archaeal virus SIRV1 has features in common with genomes of eukaryal viruses. Virology 281, 6–9. Brown, J.R., Doolittle, W.F., 1997. Archaea and the prokaryote-to-eukaryote transition. Microbiol. Mol. Biol. Rev. 61, 456–502. Casjens, S., 2003. Prophages and bacterial genomics: what have we learned so far? Mol. Microbiol. 49, 277–300. Casjens, S.R., 2005. Comparative genomics and evolution of the tailedbacteriophages. Curr. Opin. Microbiol. 8, 451–458. Cheng, H., Shen, N., Pei, J., Grishin, N.V., 2004. Double-stranded DNA bacteriophage prohead protease is homologous to herpesvirus protease. Protein Sci. 13, 2260–2269. Coles, M., Djuranovic, S., Soding, J., Frickey, T., Koretke, K., Truffault, V., Martin, J., Lupas, A.N., 2005. AbrB-like transcription factors assume a swapped hairpin fold that is evolutionarily related to double-psi beta barrels. Structure (Camb) 13, 919–928. Cordes, M.H., Walsh, N.P., McKnight, C.J., Sauer, R.T., 1999. Evolution of a protein fold in vitro. Science 284, 325–328. Daniels, L.L., Wais, A.C., 1984. Restriction and modification of halophage S45 in Halobacterium. Curr. Microbiol. 10, 133–136. Daniels, L.L., Wais, A.C., 1990. Ecophysiology of bacteriophageS5100 infecting Halobacterium cutirubrum. Appl. Environ. Microbiol. 56, 3605–3608. DeLange, A.M., Reddy, M., Scraba, D., Upton, C., McFadden, G., 1986. Replication and resolution of cloned poxvirus telomeres in vivo generates linear minichromosomes with intact viral hairpin termini. J. Virol. 59, 249–259. Dyall-Smith, M., Tang, S.L., Bath, C., 2003. Haloarchaeal viruses: how diverse are they? Res. Microbiol. 154, 309–313. Edgell, D.R., Doolittle, W.F., 1997. Archaea and the origin(s) of DNA replication proteins. Cell 89, 995–998. Elsasser, S., Finley, D., 2005. Delivery of ubiquitinated substrates to proteinunfolding machines. Nat. Cell Biol. 7, 742–749. Fitch, W.M., 1970. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–106. Geslin, C., Le Romancer, M., Erauso, G., Gaillard, M., Perrot, G., Prieur, D., 2003. PAV1, the first virus-like particle isolated from a hyperthermophilic euryarchaeote “Pyrococcus abyssi”. J. Bacteriol. 185, 3888–3894. Golovanov, A.P., Barilla, D., Golovanova, M., Hayes, F., Lian, L.Y., 2003. ParG, a protein required for active partition of bacterial plasmids, has a dimeric ribbon-helix-helix structure. Mol. Microbiol. 50, 1141–1153. Gomis-Ruth, F.X., Sola, M., Acebo, P., Parraga, A., Guasch, A., Eritja, R., Gonzalez, A., Espinosa, M., del Solar, G., Coll, M., 1998. The structure of plasmid-encoded transcriptional repressor CopG unliganded and bound to its operator. EMBO J. 17, 7404–7415. Gorbalenya, A.E., Koonin, E.V., 1989. Viral proteins containing the purine NTP-binding sequence pattern. Nucleic Acids Res. 17, 8413–8440. Gropp, F., Grampp, B., Stolt, P., Palm, P., Zillig, W., 1992. The immunityconferring plasmid p phi HL from the Halobacterium salinarium phage phi H: nucleotide sequence and transcription. Virology 190, 45–54. Gropp, F., Palm, P., Zillig, W., 1989. Expression and regulation of Halobacterium halobium phage phi H genes. Can. J. Microbiol. 35, 182–188. Haring, M., Peng, X., Brugger, K., Rachel, R., Stetter, K.O., Garrett, R.A., Prangishvili, D., 2004. Morphology and genome organization of the virus PSV of the hyperthermophilic archaeal genera Pyrobaculum and Ther-

66

D. Prangishvili et al. / Virus Research 117 (2006) 52–67

moproteus: a novel virus family, the Globuloviridae. Virology 323, 233– 242. Haring, M., Rachel, R., Peng, X., Garrett, R.A., Prangishvili, D., 2005a. Viral diversity in hot springs of Pozzuoli, Italy, and characterization of a unique archaeal virus, Acidianus bottle-shaped virus, from a new family, the Ampullaviridae. J. Virol. 79, 9904–9911. Haring, M., Vestergaard, G., Brugger, K., Rachel, R., Garrett, R.A., Prangishvili, D., 2005b. Structure and genome organization of AFV2, a novel archaeal lipothrixvirus with unusual terminal and core structures. J. Bacteriol. 187, 3855–3858. Haring, M., Vestergaard, G., Rachel, R., Chen, L., Garrett, R.A., Prangishvili, D., 2005c. Virology: independent virus development outside a host. Nature 436, 1101–1102. Hendrix, R.W., 2004. Hot new virus, deep connections. Proc. Natl. Acad. Sci. U.S.A. 101, 7495–7496. Hendrix, R.W., Smith, M.C., Burns, R.N., Ford, M.E., Hatfull, G.F., 1999. Evolutionary relationships among diverse bacteriophages and prophages: all the world’s a phage. Proc. Natl. Acad. Sci. U.S.A. 96, 2192– 2197. Huffman, J.L., Brennan, R.G., 2002. Prokaryotic transcription regulators: more than just the helix-turn-helix motif. Curr. Opin. Struct. Biol. 12, 98–106. Iyer, L.M., Aravind, L., Koonin, E.V., 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. Virol. 75, 11720–11734. Iyer, L.M., Balaji, S., Koonin, E.V., Aravind, L., 2006. Evolutionary genomics of nucleo-cytoplasmic large DNA viruses. Virus Res. 117, 156– 184. Iyer, L.M., Koonin, E.V., Leipe, D.D., Aravind, L., 2005. Origin and evolution of the archaeo-eukaryotic primase superfamily and related palm-domain proteins: structural insights and new members. Nucleic Acids Res. 33, 3875–3896. Iyer, L.M., Leipe, D.D., Koonin, E.V., Aravind, L., 2004a. Evolutionary history and higher order classification of AAA+ ATPases. J. Struct. Biol. 146, 11–31. Iyer, L.M., Makarova, K.S., Koonin, E.V., Aravind, L., 2004b. Comparative genomics of the FtsK-HerA superfamily of pumping ATPases: implications for the origins of chromosome segregation, cell division and viral capsid packaging. Nucleic Acids Res. 32, 5260–5279. Janekovic, D., Wunderl, S., Holz, I., Zillig, W., Gierl, A., Neumann, H., 1983. TTV1, TTV2 and TTV3, a family of viruses of the extremely thermophilic, anaerobic sulfur reducing archaebacterium Thermoproteus tenax. Mol. Gen. Genet. 192, 39–45. Jordan, M., Meile, L., Leisinger, T., 1989. Organisation of Methanobacterium thermoautotrophicum bacteriophage M1 DNA. Mol. Gen. Genet. 220, 161–164. Ken, R., Hackett, N.R., 1991. Halobacterium halobium strains lysogenic for phage phi H contain a protein resembling coliphage repressors. J. Bacteriol. 173, 955–960. Kessler, A., Brinkman, A.B., van der Oost, J., Prangishvili, D., 2004. Transcription of the rod-shaped viruses SIRV1 and SIRV2 of the hyperthermophilic archaeon sulfolobus. J. Bacteriol. 186, 7745–7753. Khayat, R., Tang, L., Larson, E.T., Lawrence, C.M., Young, M., Johnson, J.E., 2005. From the cover: structure of an archaeal virus capsid protein reveals a common ancestry to eukaryotic and bacterial viruses. Proc. Natl. Acad. Sci. U.S.A. 102, 18944–18949. Klein, R., Baranyi, U., Rossler, N., Greineder, B., Scholz, H., Witte, A., 2002. Natrialba magadii virus phiCh1: first complete nucleotide sequence and functional organization of a virus infecting a haloalkaliphilic archaeon. Mol. Microbiol. 45, 851–863. Knopf, C.W., 1998. Evolution of viral DNA-dependent DNA polymerases. Virus Genes 16, 47–58. Koonin, E.V., 2005. Orthologs, paralogs and evolutionary genomics. Annu. Rev. Genet. 39, 309–338. Koonin, E.V., Makarova, K.S., Aravind, L., 2001. Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. 55, 709–742. Koonin, E.V., Martin, W., 2005. On the origin of genomes and cells within inorganic compartments. Trends Genet. 21, 647–654.

Koonin, E.V., Mushegian, A.R., Galperin, M.Y., Walker, D.R., 1997. Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol. Microbiol. 25, 619–637. Kuo, T.T., Tu, J., 1976. Enzymatic synthesis of deoxy-5-methyl-cytidylic acid replacing deoxycytidylic acid in Xanthomonas oryzae phage Xp12DNA. Nature 263, 615. Lawrence, J.G., Hendrickson, H., 2003. Lateral gene transfer: when will adolescence end? Mol. Microbiol. 50, 739–749. Leipe, D.D., Aravind, L., Koonin, E.V., 1999. Did DNA replication evolve twice independently? Nucleic Acids Res. 27, 3389–3401. Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P., Bork, P., 2004. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 32, D142–D144. Luo, Y., Pfister, P., Leisinger, T., Wasserfallen, A., 2001. The genome of archaeal prophage PsiM100 encodes the lytic enzyme responsible for autolysis of Methanothermobacter wolfeii. J. Bacteriol. 183, 5788–5792. Marchler-Bauer, A., Bryant, S.H., 2004. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 32 (Web Server issue), W327–W331. Markine-Goriaynoff, N., Gillet, L., Van Etten, J.L., Korres, H., Verma, N., Vanderplasschen, A., 2004. Glycosyltransferases encoded by viruses. J. Gen. Virol. 85, 2741–2754. Martin, A., Yeats, S., Janekovic, D., Reiter, W.-D., Aicher, W., Zillig, W., 1984. SAV1, a temperate UV-inducible DNA virus-like particle from the archaebacterium Sulfolobus acidocaldarius isolate B12. EMBO J. 3, 2165–2168. Meile, L., Jenal, U., Studer, D., Jordan, M., Leisinger, T., 1989. Characterization of M1, a virulent phage of Methanobacterium thermoautotrophicum Marburg. Arch. Microbiol. 152, 105–110. Miller, E.S., Kutter, E., Mosig, G., Arisaka, F., Kunisawa, T., Ruger, W., 2003. Bacteriophage T4 genome. Microbiol. Mol. Biol. Rev. 67, 86–156. Nandhagopal, N., Simpson, A.A., Gurnon, J.R., Yan, X., Baker, T.S., Graves, M.V., Van Etten, J.L., Rossmann, M.G., 2002. The structure and evolution of the major capsid protein of a large, lipid-containing DNA virus. Proc. Natl. Acad Sci. U.S.A. 99, 14758–14763. Neumann, H., Schwass, V., Eckerskorn, C., Zillig, W., 1989. Identification and characterization of the genes encoding three structural proteins of the Thermoproteus tenax virus TTV1. Mol. Gen. Genet. 217, 105–110. Nolling, J., Groffen, A., M., D.W., 1993. F1 and F3, two novel virulent, archaeal phages infecting different thermophilic strains of the genus Methanobacterium. J. Gen. Microbiol. 139, 2511–2516. Nuttall, S.D., Dyall-Smith, M.L., 1993. HF1 and HF2: novel bacteriophages of halophilic archaea. Virology 197, 678–684. Nuttall, S.D., Dyall-Smith, M.L., 1995. Halophage HF2: genome organization and replication strategy. J. Virol. 69, 2322–2327. Palm, P., Schleper, C., Grampp, B., Yeats, S., McWilliam, P., Reiter, W.D., Zillig, W., 1991. Complete nucleotide sequence of the virus SSV1 of the archaebacterium Sulfolobus shibatae. Virology 185, 242–250. Pauling, C., 1982. Bacteriophages of Halobacterium halobium: isolation from fermented fish sauce and primary characterization. Can. J. Microbiol. 28, 916–921. Pedulla, M.L., Ford, M.E., Houtz, J.M., Karthikeyan, T., Wadsworth, C., Lewis, J.A., Jacobs-Sera, D., Falbo, J., Gross, J., Pannunzio, N.R., Brucker, W., Kumar, V., Kandasamy, J., Keenan, L., Bardarov, S., Kriakov, J., Lawrence, J.G., Jacobs Jr., W.R., Hendrix, R.W., Hatfull, G.F., 2003. Origins of highly mosaic mycobacteriophage genomes. Cell 113, 171–182. Peng, X., Blum, H., She, Q., Mallok, S., Brugger, K., Garrett, R.A., Zillig, W., Prangishvili, D., 2001. Sequences and replication of genomes of the archaeal rudiviruses SIRV1 and SIRV2: relationships to the archaeal lipothrixvirus SIFV and some eukaryal viruses. Virology 291, 226– 234. Peng, X., Kessler, A., Phan, H., Garrett, R.A., Prangishvili, D., 2004. Multiple variants of the archaeal DNA rudivirus SIRV1 in a single host and a novel mechanism of genomic variation. Mol. Microbiol. 54, 366–375. Perez-Rueda, E., Collado-Vides, J., Segovia, L., 2004. Phylogenetic distribution of DNA-binding transcription factors in bacteria and archaea. Comput. Biol. Chem. 28, 341–350.

D. Prangishvili et al. / Virus Research 117 (2006) 52–67 Pfister, P., Wasserfallen, A., Stettler, R., Leisinger, T., 1998. Molecular analysis of Methanobacterium phage psiM2. Mol. Microbiol. 30, 233–244. Porter, K., Kukkaro, P., Bamford, J.K., Bath, C., Kivela, H.M., Dyall-Smith, M.L., Bamford, D.H., 2005. SH1: a novel, spherical halovirus isolated from an Australian hypersaline lake. Virology 335, 22–33. Prangishvili, D., 2003. Evolutionary insights from studies on viruses of hyperthermophilic archaea. Res. Microbiol. 154, 289–294. Prangishvili, D., Arnold, H.P., Gotz, D., Ziese, U., Holz, I., Kristjansson, J.K., Zillig, W., 1999. A novel virus family, the Rudiviridae: structure, virus-host interactions and genome variability of the sulfolobus viruses SIRV1 and SIRV2. Genetics 152, 1387–1396. Prangishvili, D., Garrett, R.A., 2004. Exceptionally diverse morphotypes and genomes of crenarchaeal hyperthermophilic viruses. Biochem. Soc. Trans. 32, 204–208. Prangishvili, D., Garrett, R.A., 2005. Viruses of hyperthermophilic Crenarchaea. Trends Microbiol. 13, 535–542. Prangishvili, D., Stedman, K., Zillig, W., 2001. Viruses of the extremely thermophilic archaeon Sulfolobus. Trends Microbiol. 9, 39–43. Rachel, R., Bettstetter, M., Hedlund, B.P., Haring, M., Kessler, A., Stetter, K.O., Prangishvili, D., 2002. Remarkable morphological diversity of viruses and virus-like particles in hot terrestrial environments. Arch. Virol. 147, 2419–2429. Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B., Suzan, M., Claverie, J.M., 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306, 1344–1350. Raumann, B.E., Rould, M.A., Pabo, C.O., Sauer, R.T., 1994. DNA recognition by beta-sheets in the Arc repressor-operator crystal structure. Nature 367, 754–757. Reiter, W.D., Palm, P., Voos, W., Kaniecki, J., Grampp, B., Schulz, W., Zillig, W., 1987. Putative promoter elements for the ribosomal RNA genes of the thermoacidophilic archaebacterium Sulfolobus sp. strain B12. Nucleic Acids Res. 15, 5581–5595. Reiter, W.D., Zillig, W., Palm, P., 1988. Archaebacterial viruses. Adv. Virus. Res. 34, 143–188. Rice, G., Stedman, K., Snyder, J., Wiedenheft, B., Willits, D., Brumfield, S., McDermott, T., Young, M.J., 2001. Viruses from extreme thermal environments. Proc. Natl. Acad. Sci. U.S.A. 98, 13341– 13345. Rice, G., Tang, L., Stedman, K., Roberto, F., Spuhler, J., Gillitzer, E., Johnson, J.E., Douglas, T., Young, M., 2004. The structure of a thermophilic archaeal virus shows a double-stranded DNA viral capsid type that spans all domains of life. Proc. Natl. Acad. Sci. U.S.A. 101, 7716– 7720. Rohrmann, G.F., Cheney, R., Pauling, C., 1983. Bacteriophages of Halobacterium halobium: virion DNAs and proteins. Can. J. Microbiol. 29, 627–629. Rohwer, F., 2003. Global phage diversity. Cell 113, 141. Rossler, N., Klein, R., Scholz, H., Witte, A., 2004. Inversion within the haloalkaliphilic virus phi Ch1 DNA results in differential expression of structural proteins. Mol. Microbiol. 52, 413–426. Schleper, C., Kubo, K., Zillig, W., 1992. The particle SSV1 from the extremely thermophilic archaeon Sulfolobus is a virus: demonstration of infectivity and of transfection with viral DNA. Proc. Natl. Acad. Sci. U.S.A. 89, 7645–7649. Schnabel, H., Schnabel, R., Yeats, S., Tu, J., Gierl, A., Neumann, H., Zillig, W., 1984. Genome organization and transcription in archaebacteria. Folia Biol. (Praha) 30 Spec. No., 2–6. Schnabel, H., Zillig, W., Pfaffle, P., Schnabel, R., Michel, H., Delius, H., 1982. Halobacterium halobium phage H. EMBO J. 1, 87–92. Schuler, G.D., Altschul, S.F., Lipman, D.J., 1991. A workbench for multiple alignment construction and analysis. Proteins 9, 180–190. Snyder, J.C., Stedman, K., Rice, G., Wiedenheft, B., Spuhler, J., Young, M.J., 2003. Viruses of hyperthermophilic Archaea. Res. Microbiol. 154, 474–482.

67

Somers, W.S., Phillips, S.E., 1992. Crystal structure of the met repressoroperator complex at 2.8 A resolution reveals DNA recognition by betastrands. Nature 359, 387–393. Stedman, K.M., She, Q., Phan, H., Arnold, H.P., Holz, I., Garrett, R.A., Zillig, W., 2003. Relationships between fuselloviruses infecting the extremely thermophilic archaeon Sulfolobus: SSV1 and SSV2. Res. Microbiol. 154, 295–302. Stolt, P., Grampp, B., Zillig, W., 1994. Genes for DNA cytosine methyltransferases and structural proteins, expressed during lytic growth by the phage phi H of the archaebacterium Halobacterium salinarium. Biol. Chem. Hoppe Seyler 375, 747–757. Stolt, P., Zillig, W., 1993. Antisense RNA mediates transcriptional processing in an archaebacterium, indicating a novel kind of RNase activity. Mol. Microbiol. 7, 875–882. Stolt, P., Zillig, W., 1994a. Gene regulation in halophage H—more than promoters. Syst. Appl. Microbiol. 16, 591–596. Stolt, P., Zillig, W., 1994b. Transcription of the halophage phi H repressor gene is abolished by transcription from an inversely oriented lytic promoter. FEBS Lett. 344, 125–128. Suzuki, M., 1995. DNA recognition by a beta-sheet. Protein Eng. 8, 1–4. Tang, S.L., Nuttall, S., Dyall-Smith, M., 2004. Haloviruses HF1 and HF2: evidence for a recent and large recombination event. J. Bacteriol. 186, 2810–2817. Tang, S.L., Nuttall, S., Ngui, K., Fisher, C., Lopez, P., Dyall-Smith, M., 2002. HF2: a double-stranded DNA tailed haloarchaeal virus with a mosaic genome. Mol. Microbiol. 44, 283–296. Torsvik, T., 1982. Characterization of four bacteriophages for Halobacterium with special emphasis on phage Hs1. In: Kandler, O. (Ed.), Archaebacteria. Gustav Fischer Verlag, Stuttgart, New York, pp. 351. Torsvik, T., Dundas, I.D., 1974. Bacteriophage of Halobacterium salinarium. Nature 248, 680–681. Torsvik, T., Dundas, I.D., 1980. Persisting phage infection in Halobacterium salinarium str1. J. Gen. Virol. 47, 29–36. Verma, R., Aravind, L., Oania, R., McDonald, W.H., Yates, I.J., Koonin, E.V., Deshaies, R.J., 2002. Role of Rpn11 Metalloprotease in deubiquitination and degradation by the 26S Proteasome. Science 298, 611–615. Vestergaard, G., Haring, M., Peng, X., Rachel, R., Garrett, R.A., Prangishvili, D., 2005. A novel rudivirus, ARV1, of the hyperthermophilic archaeal genus Acidianus. Virology 336, 83–92. Vogelsang-Wenke, H., Oesterhelt, D., 1988. Isolation of a halobacterial phage with a fully cytosine-methylated genome. Mol. Gen. Genet. 211, 407–414. Wais, A.C., Kon, M., MacDonald, R.E., Stollar, B.D., 1975. Salt-dependent bacteriophage infecting Halobacterium cutirubrum and H. halobium. Nature 256, 314–315. Wiedenheft, B., Stedman, K., Roberto, F., Willits, D., Gleske, A.K., Zoeller, L., Snyder, J., Douglas, T., Young, M., 2004. Comparative genomic analysis of hyperthermophilic archaeal Fuselloviridae viruses. J. Virol. 78, 1954–1961. Witte, A., Baranyi, U., Klein, R., Sulzner, M., Luo, C., Wanner, G., Kruger, D.H., Lubitz, W., 1997. Characterization of Natronobacterium magadii phage phi Ch1, a unique archaeal phage containing DNA and RNA. Mol. Microbiol. 23, 603–616. Wolf, Y.I., Brenner, S.E., Bash, P.A., Koonin, E.V., 1999. Distribution of protein folds in the three superkingdoms of life. Genome Res. 9, 17–26. Wood, A.G., Whitman, W.B., Konisky, J., 1989. Isolation and characterization of an archaebacterial viruslike particle from Methanococcus voltae A3. J. Bacteriol. 171, 93–98. Xiang, X., Chen, L., Huang, X., Luo, Y., She, Q., Huang, L., 2005. Sulfolobus tengchongensis spindle-shaped virus STSV1: virus-host interactions and genomic features. J. Virol. 79, 8677–8686. Zillig, W., Kletzin, A., Schleper, C., Holz, I., Janekovic, D., Hain, J., Lanzend¨orfer, M., Kristjansson, J., 1994. Screening for Sulfolobales, their plasmids, and their viruses in Icelandic solfataras. Syst. Appl. Microbiol. 16, 609–628.