The genome of Epstein–Barr virus type 2 strain AG876

The genome of Epstein–Barr virus type 2 strain AG876

Virology 350 (2006) 164 – 170 www.elsevier.com/locate/yviro The genome of Epstein–Barr virus type 2 strain AG876 Aidan Dolan, Clare Addison, Derek Ga...

170KB Sizes 0 Downloads 9 Views

Virology 350 (2006) 164 – 170 www.elsevier.com/locate/yviro

The genome of Epstein–Barr virus type 2 strain AG876 Aidan Dolan, Clare Addison, Derek Gatherer, Andrew J. Davison, Duncan J. McGeoch ⁎ Medical Research Council Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G11 5JR, UK Received 5 December 2005; returned to author for revision 3 January 2006; accepted 12 January 2006 Available online 21 February 2006

Abstract Two Epstein–Barr virus (EBV) types are known, EBV1 and EBV2, which possess substantially diverged alleles for latency genes EBNA-2, EBNA-3A, EBNA-3B and EBNA-3C but are thought to be otherwise similar. We report the first complete EBV2 genome sequence, for strain AG876, as 172,764 bp. The sequence was interpreted as containing at least 80 protein coding genes. Comparison with the published EBV1 sequence demonstrated that the two sequences are collinear and, outside the known diverged alleles, generally very close. The EBNA-1 gene was identified as another diverged locus, although its variation is believed not to correlate with EBV type. Patterns of substitution between the two genomes presented a wide spectrum of classes of change. No evidence was seen for involvement of B-cell-specific hypermutation systems in generation of the diverged alleles. Overall, genomic comparisons indicated that the two EBV types should be regarded as belonging to the same virus species. © 2006 Elsevier Inc. All rights reserved. Keywords: Herpesviridae; Gammaherpesvirinae; Lymphocryptovirus; Human herpesvirus 4; Viral genomics; Viral evolution; Oncogenic virus; Viral latency

Introduction Epstein–Barr virus (EBV) (family, Herpesviridae; subfamily, Gammaherpesvirinae; genus, Lymphocryptovirus; species, Human herpesvirus 4) is present at high frequencies in all human populations and has been investigated extensively, primarily on account of its association with several human malignancies. The complete genomic sequence of EBV strain B95-8 (derived from a case of infectious mononucleosis; Miller and Lipman, 1973) was published by Baer et al. (1984) and has since played a central role in EBV research. It emerged during the 1980s that there are two distinct types of EBV, referred to here as EBV1 and EBV2 (the terms EBV A and B are also used) (reviewed by Kieff and Rickinson, 2001; Zimber et al., 1986). The sequenced B95-8 strain is of type 1, but no complete type 2 genome sequence has been available. The types are defined by their versions of four latent cycle genes (encoding EBNA-2, EBNA-3A, EBNA-3B and EBNA-3C), which show pronounced differences in their sequences between EBV1 and EBV2 (Dambaugh et al., 1984; Rowe et al., 1989; Sample et al., 1990). The EBNA-3 genes form a contiguous genomic block, ⁎ Corresponding author. Fax: +44 141 337 2236. E-mail address: [email protected] (D.J. McGeoch). 0042-6822/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.virol.2006.01.015

and the EBNA-2 gene is in a non-adjacent location; all these genes are thought to act in the establishment or maintenance of latent states of EBV in growth-transformed B lymphocytes (reviewed by Kieff and Rickinson, 2001). Apart from these four genes, the genomic sequences of the two types are judged to be generally closely similar, as evaluated by restriction mapping and limited sequencing of type 2 strains. Another latent cycle gene, EBNA-1, is also known to vary among strains, but this variation is not thought to correlate with the EBV1 and EBV2 types (Bhatia et al., 1996; Habeshaw et al., 1999; MacKenzie et al., 1999; Wrightham et al., 1995). EBNA-1 has roles in initiation of genome replication and transcriptional regulation during latency (reviewed by Leight and Sugden, 2000). Both EBV types are widely distributed in human populations and appear to have near equivalent biological properties, although lymphocytes transformed to an indefinite growth phenotype by EBV1 propagate more vigorously than cells transformed by EBV2 (Rickinson et al., 1987), and EBNA-2 is a major determinant of this phenotype (Cohen et al., 1989). While most EBV isolates examined are cleanly of one type or the other, occurrences of natural recombinants that disrupt the typical allele associations have been reported (Burrows et al., 1996; Midgley et al., 2000; Shim et al., 1998; Yao et al., 1996).

A. Dolan et al. / Virology 350 (2006) 164–170

Both the evolutionary origin of the two EBV types and any functional significance of the differences in four latent cycle genes remain obscure. In principle, the observed pattern of diverged alleles in two genomic regions might have arisen by recombination events involving at least two ancestral lymphocryptoviruses (LCVs) or through processes of mutation within a single lineage; neither of these possible routes presents as unproblematic (McGeoch, 2001; McGeoch and Davison, 1999). The existence of the two genomic types is thus a challenge for our understanding of the evolutionary antecedents of this ubiquitous human pathogen. In order to analyze further the phenomenon of EBV types, we have determined the complete genome sequence for the commonly studied EBV2 isolate AG876, which originated from a Ghanaian case of Burkitt's lymphoma (Pizzo et al., 1978). In this paper, we report characteristics of the AG876 sequence and employ it to compile and evaluate a genome-wide comparison with the EBV1 B958 sequence. In addition, the availability of the EBV2 genomic sequence afforded an opportunity to update details of the EBV1 B95-8 sequence and its annotation, and we extended comparative evaluations to the two other LCVs whose genomes have been sequenced, Cercopithecine herpesvirus 15 (CeHV-15; rhesus lymphocryptovirus) (Rivailler et al., 2002a) and Callitrichine herpesvirus 3 (CalHV-3; marmoset lymphocryptovirus) (Rivailler et al., 2002b). Results Determination of the genome sequence of EBV2 strain AG876 The complete genomic DNA sequence of EBV2 strain AG876 was determined using eight cosmid-cloned fragments, supplemented by PCR amplification of selected regions. A listing of the AG876 sequence was assembled to correspond to conventions used in the data library entries for the EBV1 B958 sequence (accession numbers V01555 and AJ507799). The AG876 sequence as determined corresponded to the circular, latent plasmid form of the genome. The linear listing of the rightward 5′ to 3′ strand was defined as starting one nucleotide 5′ to the sequence equivalent to the EcoRI site separating the EcoRI Dhet and EcoRI I fragments in strain B95-8. This scheme places sequences for the terminal repeat element adjacent to the end of the listing, with three repeat copies of 558, 544 and 558 nuc as found in the cosmid sequenced (the EBV1 entry has four copies). Sequences representing the major internal repeat element IR1, of 3070 nuc per copy in AG876, were edited to give 7.6 copies as for the current EBV1 database entry (AJ507799). These extended from nuc 11,986 to nuc 35,323. The EBV genome contains many families of tandem repeats of short sequence elements, in some cases showing sequence variations among copies. Certain of these families presented problems in sequence determination, which were handled as follows. For the repeat family region (RF; nuc 7450 to 8019) within the latent cycle origin of DNA replication oriP (Yates et al., 1984), PCR analyses showed that the cosmid-cloned version of RF was partly deleted relative to that in DNA of the lymphoblastoid cell line (LCL) carrying EBV AG876, and

165

sequencing templates for this region were therefore obtained by PCR amplification from LCL DNA. The final sequence obtained for the RF element comprised 19 copies of 30 nuc, with some sequence differences among copies, and was considered to be the best representation achievable; we note that sequence instabilities in RF have been described by Fruscalzo et al. (2001). The repeat set IR2 (nuc 38,281 to 39,944; repeat length 125 nuc; four classes of variant) was set to 13 copies as determined for strain AG876 by Dambaugh and Kieff (1982). The detail of sequence variants in an internal portion of six copies of the set was not resolved. We encountered a repeat set in the second exon of the EBNA-3B gene (approximately nuc 85,241 to 85,980; 12 copies; repeat length 60 nuc; some sequence variation), where a previous analysis of this gene in strain AG876 showed only three repeat copies (Sample et al., 1990). We considered our sequence of this set as most likely fully resolved. We note that the EBNA3B sequence has been reported for another EBV2 strain (MISP), with 15 copies in this repeat family (Inoue et al., 1992). For the IR3 set (nuc 96,759 to 97,466, a complex set of triplet repeats within the coding sequence of the EBNA-1 gene), a version was produced that accounted parsimoniously for all the shotgun reads of the region, and whose length was compatible with the analysis of Heller et al. (1982); this set was considered as most likely fully resolved. The IR4 set (nuc 141,591 to 144,696; repeat length 102 and 103 nuc; three classes of variant) was set to 30 copies, as determined by Dambaugh and Kieff (1982). In this set, the detail of sequence variants in an internal portion of 25 copies was not resolved. Sequences have been published for a number of loci within the AG876 genome. These include the EBER transcript region (Arrand et al., 1989); the EBNA-2 gene (Dambaugh et al., 1984); repeat families IR2 and IR4 (Dambaugh and Kieff, 1982); the BLLF1 (gp340) gene and neighboring sequences (Lees et al., 1993); the EBNA-3A, EBNA-3B and EBNA-3C genes (Sample et al., 1990); part of the EBNA-1 gene (Wrightham et al., 1995); the LMP-1 gene (Sample et al., 1994); and part of the LMP-2A gene (Busson et al., 1995). In all, these prior data represented 16% of the AG876 sequence. We found them to be identical to our versions or almost so, with the one substantive exception being the repeat set in the EBNA-3B gene mentioned above. All differences between the EBV2 sequence and the EBV1 sequence, apart from the diverged EBNA-2 and EBNA-3 alleles, were checked as a late-stage validation of the EBV2 sequence. The assembled AG876 genome sequence (accession number DQ279927) contained 172,764 nuc, base composition 59.5% G+C. Gene content of EBV2 strain AG876 The genome sequences of EBV1 and EBV2 AG876 are closely collinear and, with the previously known exceptions of the genes for EBNA-2 and the EBNA-3s, are near identical. It was thus straightforward to transfer EBV1's genome annotation to the AG876 sequence to yield the same gene layout. Two frame shifting differences between the EBV1 and EBV2 sequences were observed in regions judged plausibly protein coding, at EBV2 residues 135,774/135,775 (a C residue was present

166

A. Dolan et al. / Virology 350 (2006) 164–170

between these nucleotides in EBV1) and 136,056 (a C residue was absent from EBV1). In collaboration with P. J. Farrell (Imperial College, London), both were found experimentally to represent errors in the EBV1 sequence. Interpretation of this locus was then revised, resulting in definition of a new open reading frame (ORF), BVLF1. We assessed all potential protein coding ORFs in EBV2 by criteria of overlap with other ORFs and by comparisons with the sequences of EBV1, CeHV-15 and CalHV-3, with the aim of producing a conservative listing of visible protein coding genes. Three further ORFs (BFRF1A, BGLF3.5 and BDLF3.5) previously not annotated in EBV1 were identified. These three ORFs and BVLF1 all have orthologues in the genus Rhadinovirus (the other genus of the Gammaherpesvirinae) and in the Betaherpesvirinae, and two have orthologues in the Alphaherpesvirinae. Details of these four EBV2 genes are listed in Table 1. Five other ORFs were redefined from their EBV1 versions to start with ATG codons, according to modern practice (BBLF2/BBLF3, BGLF4, BTRF1, BALF3 and BNLF2b). Overall, we identified with high confidence 80 genes encoding proteins in the EBV2 genome, with the members of two pairs of genes that overlap in the same reading frame (LMP-2A/LMP-2B and BVRF2/BdRF1) counted separately. In addition, there were several putative EBV2 coding regions, transferred from the EBV1 annotation, for which we reached a position of significant doubt that they encode functional proteins. These evaluations were based on considerations that included (in different cases): positioning or lack of splicing or translation signals; content of repeat sets whose units were not a multiple of 3 nuc (and would thus translate cyclically in every possible reading frame); extensive overlap with other ORFs judged as genuine; and non-conservation in the CeHV-15 or CalHV-3 sequences. Published data on expression of encoded proteins were also considered. ORFs in this category were BWRF1, BLLF2, BHLF1, LF3 and several (including BARF0, A73 and RPMS1) encoded by a complex family of alternatively spliced RNAs termed the BamHI A rightward transcripts (BARTs) (Hitt et al., 1989; Sadler and Raab-Traub, 1995; Smith et al., 1993). These dubious assignments range from quite extensively analyzed genomic regions to little-studied features that have been listed in the EBV1 annotation for many years and may represent no more than loose ends from early efforts to comprehend EBV1's gene complement. Our comparisons also identified several discrepancies between the EBV2 sequence and those of EBV's closest

sequenced relatives, CeHV-15 and CalHV-3. These were investigated experimentally in collaboration with F. Wang (Harvard Medical School) and corrections made in the accessions for CeHV-15 and CalHV-3. The genes defined or corrected were in CeHV-15, EBNA-LP, BFRF1A, BcRF1 and BVLF1, and in CalHV-3, BVLF1, BXLF2 and BGRF1/BDRF1. Comparative analyses of the EBV1 and EBV2 genome sequences The EBV1 and AG876 sequences were aligned using CLUSTAL W to give an alignment of length 173,695 characters. Fig. 1 shows the locations and magnitudes of insertion/ deletion (indel) and substitutional differences between the aligned DNA sequences. The figure emphasizes that over most of their lengths, the two genomes are highly similar, with few indels and a sparse background of substitution divergence, while the strongly diverged regions are sharply delimited. The locations of variable genomic features that have already been mentioned are indicated in the figure, plus that of the LMP-1 gene, and five other variable loci are labeled 1 to 5. The EBNA-2, EBNA-3A, EBNA-3B and EBNA-3C genes, whose distinct alleles define the two EBV types, are the loci of major peaks for both classes of sequence differences. The EBNA-1 gene, situated to the right of the EBNA-3 block, also shows substantial indel and substitution divergence. The diverged loci labeled as 2 and 3 lie within the coding regions of, respectively, the BPLF1 large tegument protein and the BLLF1 gp340 virion glycoprotein, while locus 4 represents an indel in an intron of the BZLF1 gene. In all cases, these indel differences within genes result mainly from distinct copy numbers in families of short repeat sequences. In the LMP-1 gene, there are two closely spaced deletions relative to EBV1 in EBV2 coding sequence, of 30 and 15 nuc (the latter in a repeat family); these deletions were previously noted by Sample et al. (1994). Turning to diverged loci not in recognized genes, the repeat families IR2, IR4 and TR register as indel peaks primarily because of the differing numbers of repeat copies in the two sequences. The RF element of oriP appears as a markedly variable locus, and the study of Fruscalzo et al. (2001) has shown that this repeat set is particularly volatile, with repeat copies varying pronouncedly in sequence both within one genome and among virus isolates, in addition to variability in copy number. As an added complication, we found the RF region to be deletion

Table 1 Novel genes annotated in the genome of EBV2 Gene

Sense

Coding region (nucs)

Size (codons)

Rhadinovirus orthologue a

Alphaherpes. orthologue a

Betaherpes. orthologue a

Attributes of gene product b

BVLF1 BFRF1A BGLF3.5 BDLF3.5

− + − −

135,636–136,454 46,352–46,759 112,035–112,496 117,539–117,772

272 135 153 77

ORF18 ORF67A ORF35 ORF30

– UL33 UL14 –

UL79 UL51 UL96 UL91

– Role in DNA packaging Tegument component –

a Orthologues in the Rhadinovirus genus, the Alphaherpesvirinae subfamily and the Betaherpesvirinae subfamily were represented respectively by genes of equine herpesvirus 2 (U20824), herpes simplex virus type 1 (X14112) and human cytomegalovirus (AY446894). b Attributes were derived from the above database entries.

A. Dolan et al. / Virology 350 (2006) 164–170

167

Fig. 1. Comparisons of the aligned EBV1 and EBV2 genome sequences. The complete EBV1 and EBV2 sequences were aligned using CLUSTAL W, to give an alignment of overall length 173,695 residues including gapping characters. The GCG program PLOTSIMILARITY was then used to display separately the incidences of addition/deletion differences (A) and of substitution differences (B) between the two genomes; for each measure, values are plotted as percentage differences over a moving 600-character window. In panel A, the three highest peaks are shown truncated. Locations of variable regions are annotated with abbreviated names or with numerals (1 to 5).

prone when cloned in a cosmid, giving reduced confidence in any supposedly complete sequence. Lastly, there are two indel differences not associated with multicopy tandem arrays: labeled feature 1 marks a 105-nuc sequence downstream of the EBNA-2 transcript, which is duplicated in EBV2 but not in EBV1, and feature 5 denotes a 74-nuc deletion in EBV1 relative to EBV2 in an apparently non-coding region near the right origin of lytic DNA replication. Indel and nucleotide substitutional differences accounted for 1.61% and 1.38% respectively of the complete alignment. When the diverged EBNA-2 and EBNA-3 genes are excluded, the values fall to 1.10% and 0.73% respectively. Such overall levels of differences are comparable to those seen between whole genome sequences for different isolates of other human herpesvirus species (data not shown). We were interested in the possibility that the diverged alleles in EBV1 and EBV2 might have arisen through the action of some specific mutagenic process. We considered two such, namely the system for hypermutational maturation of antibody affinity and processes of methylation and deamination of cytosine residues in DNA. In the first of these, the enzyme known as B cell activation-induced cytidine deaminase (AID) is known to have a central role (Muramatsu et al., 2000; Revy et al., 2000). In activated B cells, AID acts to deaminate cytosine residues to uracil in the V regions of transcribed immunoglobulin genes with target preference for the motif 5′-RGYW/5′-WRCY (Rogozin and Kolchanov, 1992; Yoshikawa et al., 2002). Given that the EBNA genes are expressed during latent states

of EBV in B cells, we regarded as plausible that AID might act as a specific mutator of EBNA genes. Regarding methylation/deamination, it is known that enzymatic methylation of cytosine residues in the sequence 5′-CG plays a role in EBV latency (reviewed by Tao and Robertson, 2003). Since non-enzymatic hydrolytic deamination of 5′-methylcytosine to thymine occurs at an appreciable rate, such methylation of DNA leads to mutation and in fact has done so on a large scale in the EBV genome as evidenced by general scarcity of the sequence 5′-CG (Honess et al., 1989). We examined patterns of substitution between the diverged EBNA gene regions in the whole genome alignment, and separately between the remaining regions in the alignment, by computing frequencies of each single nucleotide difference in the contexts of 5′-neighbors, 3′-neighbors and more extended adjacent sequences. In summary, all possible classes of substitution were present, in terms of the residue differing and its immediate 5′ or 3′ neighbors, and similar patterns were seen for the EBNA-2 plus EBNA-3 loci and for the rest of the genome (data not shown). We did not detect any indication that AID had played a marked role in mutational divergence of the EBV1 and EBV2 EBNA alleles. Changes involving 5′-CG were an abundant class, and presumably, these resulted at least in part from cytosine methylation and deamination. Relative levels of synonymous and non-synonymous differences between EBV1 and EBV2 in the EBNA-2 gene and the EBNA-3 genes were overall indicative of purifying selection rather than neutral or positive selection (McGeoch and Davison, 1999).

168

A. Dolan et al. / Virology 350 (2006) 164–170

Discussion The two types of EBV are defined by their possession of substantially diverged alleles of four latent cycle genes. Our complete sequence for an EBV2 strain now shows definitively that a very high overall level of similarity exists across all other regions of the EBV1 and EBV2 genomes, equivalent to that seen between different isolates in other herpesvirus species. Most of the small number of other marked differences between the two EBV sequences represent variable copy numbers in families of tandemly repeated elements and, as such, can be discounted as likely to be of little or no functional significance. We thus consider that EBV1 and EBV2 are to be regarded as definitely members of the same species, differing significantly only in allelic differences in a small subset of their genes. The sizeable difference between the two alleles of the EBNA-1 gene in Fig. 1 is eye catching, especially with its proximity to the EBNA-3 region. However, previous studies of many EBV isolates have demonstrated that the EBNA-1 gene exhibits marked diversity, but that this does not show a strong correlation with EBV type as defined by EBNA-2 and EBNA3 alleles (Bhatia et al., 1996; Habeshaw et al., 1999; MacKenzie et al., 1999; Wrightham et al., 1995). Differences in the LMP-1 gene similarly do not correlate with EBV type (Sample et al., 1994). Conversely, however, Arrand et al. (1989) noted that polymorphisms in the EBER region near the left terminus of the genome did generally correspond with EBV type. Overall, our current view, from comparison of the complete genome sequences and from published data on local polymorphisms for additional virus strains, is that the haplotype pattern seen with EBNA-2, EBNA-3A, EBNA-3B and EBNA-3C genes does not extend strongly to other large scale, substantially diverged features of the EBV genome. Quantitating the extent to which this pattern might apply to variable features along the genome (such as single nucleotide polymorphisms) would require sampling of sequences from additional strains at a number of sites. The evolutionary origins of the patterns of EBV genomic divergence now visible are unclear. Two possible mechanisms for generation of the two EBV types have been mooted (McGeoch, 2001; McGeoch and Davison, 1999). The first proposes that recombination events involving at least two ancestral LCVs yielded two lineages corresponding to EBV1 and EBV2. Problems with this model are that it posits an otherwise undetected parent, and it does not (in our view) account very satisfactorily for the close correspondence between boundaries of diverged regions and the boundaries of latent gene blocks. The second proposed mechanism invokes a process of preferential mutation of the four genes. Setting aside the fact that the underlying reason for specific elevated mutation remains undefined, our main reservation regarding this model is that it does not account for only two distinct alleles for each gene rather than a bush of diverged forms; this feature, however, could be accommodated by proposing additionally that the two lineages occupied non-overlapping niches during the period of allelic divergence. Our examination of single nucleotide differences between sections of the EBV1 and EBV2 genomes did not reveal

any simple pattern of substitution. Nonetheless, and regardless of the precise mechanism, we believe that the fact that the genes with diverged alleles are in active transcriptional states during latent infection does present an attractive rationale for them being uniquely exposed to mutational processes. It is noteworthy that comparisons of EBV genes with those of its nearest sequenced relative, CeHV-15, show that the latent cycle genes are particularly diverged between the two (Rivailler et al., 2002a), consistent with them having a high mutation rate relative to other genes. The occurrence of forms of EBV with reassorted EBNA-2 and EBNA-3 alleles (Burrows et al., 1996; Midgley et al., 2000; Shim et al., 1998; Yao et al., 1996) demonstrates that at the present time, the niches of EBV1 and EBV2 overlap to the extent that dual infections and intertypic recombination occur. Such recombination can also account for the lack of correlation of variation in the EBNA-1 and LMP-1 loci with that in the EBNA-2 gene and the EBNA-3 gene block. With availability of complete sequences for both EBV types, it should now be feasible to use polymorphisms across the genome to arrive at a quantitative evaluation of incidence of recombination in a set of EBV strains and thus a better understanding of the population structure and evolution of this ubiquitous human pathogen. Materials and methods DNA sequence determination Cultures of the LCL carrying EBV2 strain AG876 were gifts from A. B. Rickinson (University of Birmingham, UK) and J. P. Stewart (University of Liverpool, UK). LCL DNAwas subjected to partial digestion with Sau3AI and used to make a cosmid library with Supercos1 (Stratagene) as modified by Cunningham and Davison (1993). The library was probed with 32P-labeled plasmid-cloned EBV1 B95-8 BamHI fragments a, b and o (Arrand et al., 1981) (gifts of J. R. Arrand, Paterson Institute, Manchester, UK) and clones carrying EBV2 AG876 sequences overlapping the probes were isolated and characterized by terminal sequencing. Further probing with selected EBVAG876 cosmids and with a 380-bp PCR-generated fragment representing the left extremity of the genome gave a set of eight cosmids which, on the basis of comparison of their terminal sequences with the EBV1 genomic sequence, covered the entire AG876 genome with overlaps, except for one region of 2.1 kbp adjacent to the left end of the linear genome listing. This gap in the cosmid library was filled by amplifying the target region from LCL DNA by PCR and cloning into M13mp18. DNA sequences of the eight cosmids were determined by shotgun cloning into M13mp18 and dideoxynucleotide methods with an ABI 37796XL sequencer. The RF region in oriP was found to be prone to deletion in the cosmid-cloned version, and sequencing templates to resolve the region were obtained by PCR amplification from LCL DNA across the repeat tract, agarose gel electrophoresis and selection of the maximum length product, and cloning of randomly sheared fragments into the plasmid vector pSTBlue-1 (Novagen). Other problem regions were analyzed by sequencing from specific primers and by sequencing of M13-cloned PCR

A. Dolan et al. / Virology 350 (2006) 164–170

products. Individual cosmid sequences were assembled from shotgun data and edited using GAP4 software with the basecalling program PHRED (Ewing and Green, 1998; Ewing et al., 1998; Staden et al., 2000). Genomic sequences of LCVs used in comparative analyses The data library entry for the complete genome sequence of EBV1 B95-8 (Baer et al., 1984; accession number V01555) has been supplemented with an 11.8 kbp sequence of EBV1 strain Raji (Parker et al., 1990) to make good a deletion specific to B95-8 and updated, to give a representation of the complete EBV1 genome (de Jesus et al., 2003; accession number AJ507799). Primary comparisons were with this sequence. In collaboration with P. J. Farrell (Imperial College, London), we carried out PCR and sequencing of B95-8 DNA that corrected two residual errors. The resulting EBV1 sequence, with an upgraded annotation, is available in accession number NC_ 007605. The genome sequence for a second EBV1 strain, GD1, has very recently become available (Zeng et al., 2005; accession number AY961628), and this was utilized for specific comparisons. Comparative analyses also included the genome sequences of two other LCVs: CeHV-15 (Rivailler et al., 2002a, accession number AY037858) and CalHV-3 (Rivailler et al., 2002b; accession number AF319782). We reassessed these sequences in collaboration with F. Wang (Harvard Medical School) by carrying out PCR and sequencing of regions containing potential errors. Computational analyses of molecular sequences Sequence analyses used the GCG (Accelrys Inc.) and EMBOSS packages. Alignments of whole EBV genome sequences were performed with CLUSTAL W (Thompson et al., 1994). Perl scripts were written to evaluate differences between aligned EBV sequences.

Acknowledgments We thank J. R. Arrand for the gift of plasmid-cloned EBV1 fragments, and A. B. Rickinson and J. P. Stewart for the cultures and DNA of the AG876 cell line. References Arrand, J.R., Rymo, L., Walsh, J.E., Björck, E., Lindahl, T., Griffin, B.E., 1981. Molecular cloning of the complete Epstein–Barr virus genome as a set of overlapping restriction endonuclease fragments. Nucleic Acids Res. 9, 2999–3014. Arrand, J.R., Young, L.S., Tugwood, J.D., 1989. Two families of sequences in the small RNA-encoding region of Epstein–Barr virus (EBV) correlate with EBV types A and B. J. Virol. 63, 983–986. Baer, R., Bankier, A.T., Biggin, M.D., Deininger, P.I., Farrell, P.J., Gibson, T.G., Hatfull, G., Hudson, G.S., Satchwell, S.C., Seguin, C., Tuffnell, P.S., Barrell, B.G., 1984. DNA sequence and expression of the B95-8 Epstein– Barr virus genome. Nature 310, 207–211. Bhatia, K., Raj, A., Gutiérrez, M.I., Judde, J.-G., Spangler, G., Venkatesh, H.,

169

Magrath, I.T., 1996. Variation in the sequence of Epstein Barr virus nuclear antigen 1 in normal peripheral blood lymphocytes and in Burkitt's lymphomas. Oncogene 13, 177–181. Burrows, J.M., Khanna, R., Sculley, T.B., Alpers, M.P., Moss, D.J., Burrows, S.R., 1996. Identification of a naturally occurring recombinant Epstein–Barr virus isolate from New Guinea that encodes both type 1 and type 2 nuclear antigen sequences. J. Virol. 70, 4829–4833. Busson, P., Edwards, R.H., Tursz, T., Raab-Traub, N., 1995. Sequence polymorphism in the Epstein–Barr virus latent membrane protein (LMP)2 gene. J. Gen. Virol. 76, 139–145. Cohen, J.I., Wang, F., Mannick, J., Kieff, E., 1989. Epstein–Barr virus nuclear protein 2 is a key determinant of lymphocyte transformation. Proc. Natl. Acad. Sci. U.S.A. 86, 9558–9562. Cunningham, C., Davison, A.J., 1993. A cosmid-based system for constructing mutants of herpes simplex virus type 1. Virology 197, 116–124. Dambaugh, T.R., Kieff, E., 1982. Identification and nucleotide sequences of two similar tandem direct repeats in Epstein–Barr virus DNA. J. Virol. 44, 823–833. Dambaugh, T., Hennessy, K., Chamnankit, L., Kieff, E., 1984. U2 region of Epstein–Barr virus DNA may encode Epstein–Barr nuclear antigen 2. Proc. Natl. Acad. Sci. U.S.A. 81, 7632–7636. de Jesus, O., Smith, P.R., Spender, L.C., Karstegl, C.E., Niller, H.H., Huang, D., Farrell, P.J., 2003. Updated Epstein–Barr virus (EBV) DNA sequence and analysis of a promoter for the BART (CST, BARF0) RNAs of EBV. J. Gen. Virol. 84, 1443–1450. Ewing, B., Green, P., 1998. Base-calling of automated sequencer traces using Phred: II. Error probabilities. Genome Res. 8, 186–194. Ewing, B., Hillier, L., Wendl, M.C., Green, P., 1998. Base-calling of automated sequencer traces using Phred: I. Accuracy assessment. Genome Res. 8, 175–185. Fruscalzo, A., Marsili, G., Busiello, V., Bertolini, L., Frezza, D., 2001. DNA sequence heterogeneity within the Epstein–Barr virus family of repeats in the latent origin of replication. Gene 265, 165–173. Habeshaw, G., Yao, Q.Y., Bell, A.I., Morton, D., Rickinson, A.B., 1999. Epstein–Barr virus nuclear antigen 1 sequences in endemic and sporadic Burkitt's lymphoma reflect virus strains prevalent in different geographic areas. J. Virol. 73, 965–975. Heller, M., van Santen, V., Kieff, E., 1982. Simple repeat sequence in Epstein– Barr virus DNA is transcribed in latent and productive infections. J. Virol. 44, 311–320. Hitt, M.M., Allday, M.J., Hara, T., Karran, L., Jones, M.D., Busson, P., Tursz, T., Ernberg, I., Griffin, B.E., 1989. EBV gene expression in an NPC-related tumour. EMBO J. 8, 2639–2651. Honess, R.W., Gompels, U.A., Barrell, B.G., Craxton, M., Cameron, K.R., Staden, R., Chang, Y.N., Hayward, G.S., 1989. Deviations from expected frequencies of CpG dinucleotides in herpesvirus DNAs may be diagnostic of differences in the states of their latent genomes. J. Gen. Virol. 70, 837–855. Inoue, N., Kikuta, H., Yanagi, K., 1992. Comparative sequence analysis of Epstein–Barr virus nuclear antigen-2 and -3B genes of a fresh Epstein–Barr virus-2 isolate. Intervirology 33, 180–186. Kieff, E., Rickinson, A.B., 2001. Epstein–Barr virus and its replication. In: Knipe, D.M., Howley, P.M. (Eds.), Fields Virology, vol. 2. Lippincott Williams and Wilkins, Philadelphia, pp. 2511–2573. Lees, J.F., Arrand, J.E., Pepper, S.DeV., Stewart, J.P., Mackett, M., Arrand, J.R., 1993. The Epstein–Barr virus candidate vaccine antigen gp340/ 220 is highly conserved between virus types A and B. Virology 195, 578–586. Leight, E.R., Sugden, B., 2000. EBNA-1: a protein pivotal to latent infection by Epstein–Barr virus. Rev. Med. Virol. 10, 83–100. MacKenzie, J., Gray, D., Pinto-Pes, R., Barrezueta, L.F.M., Armstrong, A.A., Alexander, F.A., McGeoch, D.J., Jarrett, R.F., 1999. Analysis of Epstein– Barr virus (EBV) nuclear antigen 1 subtypes in EBV-associated lymphomas from Brazil and the United Kingdom. J. Gen. Virol. 80, 2741–2745. McGeoch, D.J., 2001. Molecular evolution of the γ-Herpesvirinae. Philos. Trans. R. Soc. London. B 356, 421–435. McGeoch, D.J., Davison, A.J., 1999. The molecular evolutionary history of the herpesviruses. In: Domingo, E., Webster, R., Holland, J. (Eds.), Origin and Evolution of Viruses. Academic Press, London, SD, pp. 441–465.

170

A. Dolan et al. / Virology 350 (2006) 164–170

Midgley, R.S., Blake, N.W., Yao, Q.Y., Croom-Carter, D., Cheung, S.T., Leung, S.F., Chan, A.T.C., Johnson, P.J., Huang, D., Rickinson, A.B., Lee, S.P., 2000. Novel intertypic recombinants of Epstein–Barr virus in the Chinese population. J. Virol. 74, 1544–1548. Miller, G., Lipman, M., 1973. Release of infectious Epstein–Barr virus by transformed marmoset leukocytes. Proc. Natl. Acad. Sci. U.S.A. 70, 190–194. Muramatsu, M., Kinoshita, K., Fagarasan, S., Yamada, S., Shinkai, Y., Honjo, T., 2000. Class switch recombination and hypermutation require activationinduced cytidine deaminase (AID), a potential RNA editing enzyme. Cell 102, 553–563. Parker, B.D., Bankier, A., Satchwell, S., Barrell, B., Farrell, P.J., 1990. Sequence and transcription of Raji Epstein–Barr virus DNA spanning the B958 deletion region. Virology 179, 339–346. Pizzo, P.A., Magrath, I.T., Chattopadhyay, S.K., Biggar, R.J., Gerber, P., 1978. A new tumour-derived transforming strain of Epstein–Barr virus. Nature 272, 629–631. Revy, P., Muto, T., Levy, Y., Geissmann, F., Plebani, A., Sanal, O., Catalan, N., Forveille, M., Dufourcq-Labelouse, R., Gennery, A., Tezcan, I., Ersoy, F., Kayserili, H., Ugazio, A.G., Brousse, N., Muramatsu, M., Notarangelo, L.D., Kinoshita, K., Honjo, T., Fischer, A., Durandy, A., 2000. Activation-induced cytidine deaminase (AID) deficiency causes the autosomal recessive form of the hyper-IgM syndrome (HIGM2). Cell 102, 565–575. Rickinson, A.B., Young, L.S., Rowe, M., 1987. Influence of the Epstein–Barr virus nuclear antigen EBNA 2 on the growth phenotype of virus-transformed B cells. J. Virol. 61, 1310–1317. Rivailler, P., Jiang, H., Young-gyu, C., Quink, C., Wang, F., 2002a. Complete nucleotide sequence of the rhesus lymphocryptovirus: genetic validation for an Epstein–Barr virus animal model. J. Virol. 76, 421–426. Rivailler, P., Young-gyu, C., Wang, F., 2002b. Complete genomic sequence of an Epstein–Barr virus related herpesvirus naturally infecting a New World primate: a defining point in the evolution of oncogenic lymphocryptoviruses. J. Virol. 76, 12055–12068. Rogozin, I.B., Kolchanov, N.A., 1992. Somatic hypermutagenesis in immunoglobulin genes: II. Influence of neighbouring base sequences on mutagenesis. Biochim. Biophys. Acta 1171, 11–18. Rowe, M., Young, L.S., Cadwallader, K., Petti, L., Kieff, E., Rickinson, A.B., 1989. Distinction between Epstein–Barr virus type A (EBNA 2A) and type B (EBNA 2B) isolates extends to the EBNA 3 family of nuclear proteins. J. Virol. 63, 1031–1039. Sadler, R.H., Raab-Traub, N., 1995. Structural analyses of the Epstein–Barr virus BamHI A transcripts. J. Virol. 69, 1132–1141.

Sample, J., Young, L., Martin, B., Chatman, T., Kieff, E., Rickinson, A., Kieff, E., 1990. Epstein–Barr virus types 1 and 2 differ in their EBNA-3A, EBNA3B, and EBNA-3C genes. J. Virol. 64, 4084–4092. Sample, J., Kieff, E.F., Kieff, E.D., 1994. Epstein–Barr virus types 1 and 2 have nearly identical LMP-1 transforming genes. J. Gen. Virol. 75, 2741–2746. Shim, Y.-K., Kim, C.-W., Lee, W.-K., 1998. Sequence variation of EBNA2 of Epstein–Barr virus isolates from Korea. Mol. Cells 8, 226–232. Smith, P.R., Gao, Y., Karran, L., Jones, M.D., Snudden, D., Griffin, B.E., 1993. Complex nature of the major viral polyadenylated transcripts in Epstein– Barr virus-associated tumors. J. Virol. 67, 3217–3225. Staden, R., Beal, K.F., Bonfield, J.K., 2000. The Staden package 1998. Methods Mol. Biol. 132, 115–130. Tao, Q., Robertson, K.D., 2003. Stealth technology: how Epstein–Barr virus utilizes DNA methylation to cloak itself from immune detection. Clin. Immunol. 109, 53–63. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680. Wrightham, M.N., Stewart, J.P., Janjua, N.J., Pepper, S.DeV., Sample, C., Rooney, C.M., Arrand, J.R., 1995. Antigenic and sequence variation in the C-terminal unique domain of the Epstein–Barr virus nuclear antigen EBNA1. Virology 208, 521–530. Yao, Q.Y., Tierney, R.J., Croom-Carter, D., Cooper, G.M., Ellis, C.J., Rowe, M., Rickinson, A.B., 1996. Isolation of intertypic recombinants of Epstein-Barr virus from T-cell-immunocompromised individuals. J. Virol. 70, 4895–4903. Yates, J., Warren, N., Reisman, D., Sugden, B., 1984. A cis-acting element from the Epstein–Barr viral genome that permits stable replication of recombinant plasmids in latently infected cells. Proc. Natl. Acad. Sci. U.S.A. 81, 3806–3810. Yoshikawa, K., Okazaki, I., Eto, T., Kinoshita, K., Muramatsu, M., Nagaoka, H., Honjo, T., 2002. AID enzyme-induced hypermutation in an actively transcribed gene in fibroblasts. Science 296, 2033–2036. Zeng, M.S., Li, D.J., Liu, Q.L., Song, L.B., Li, M.Z., Zhang, R.H., Yu, X.J., Wang, H.M., Ernberg, I., Zeng, Y.X., 2005. Genomic sequence analysis of Epstein–Barr virus strain GD1 from a nasopharyngeal carcinoma patient. J. Virol. 79, 15323–15330. Zimber, U., Adldinger, H.K., Lenoir, G.M., Vuillaume, M., Knebel-Doeberitz, M.V., Laux, G., Desgranges, C., Wittmann, P., Freese, U.K., Schneider, U., Bornkamm, G.W., 1986. Geographical prevalence of two types of Epstein– Barr virus. Virology 154, 56–66.