Biochimie (1994) 76, 3-8 © Soci6t6 fran~aise de biochimie et biologie mol6culaire / Elsevier, Paris
Patterns of nucleotide sequence variation among cauliflower mosaic virus isolates* KD Chenault**, U Melcher*** Deparmwnt of Biochemisn3" and Molecular Biology, Oklahoma State University. Stilhvatet; Oklahoma 74078.0454, USA
(Received 19 February 1993; accepted 2 March 1993)
Summary m A consensus nucleotide sequence of the DNA of nine isolates of cauliflower mosaic virus (CaMV) was used to examine variation of nucleotide sequence in CaMV. Variability in coding regions was lowest in open reading frames (ORFs) 1, 2, 3 and 5 and higher in ORFs 4 and 6. Silent substitutions were not uniformly distributed among the ORFs. The large intergenic region was also variable, particularly in loops and bulges of a predicted secondary structure for this region of the 35S RNA transcript. A profile of frequencies of the substitution of consensus nucleotides with other nucleotides revealed a deficit of A to G transitions and an excess of transversions involving A, Most insertions/deletions could be accounted for by template misalignment during replication. The results suggest that the major source of variation in CaMV DNA sequences is associated with replication by reverse transcription. ¢aulimoviruses / reverse transeriptase I nueleotide substitution profile / consensus Introduction
In 1980, Professor Hirth's institute reported [I] the first complete nucleotide tnt) sequence of the nucleic acid of a plant virus, that of the DNA of CaMV. The nt sequence predicted six major ORFs, two minor ORFs, and a large 0.6 kb intergenic region. The large intergenic region in the 35S RNA transcript of CaMV DNA may be folded into an extensive stem-loop structure [2, 3]. Deletions in this region have varyingly stimulatory or inhibitory effects on the expression of downstream reporter genes [4]. Some insertions [5-71 and some deletions [8] have no visible effect on CaMV D N A infectivity, The ORFs 1 to 7 code for an intercellular transport protein, an aphid acquisition factor, a DNA-binding protein whose function is unclear, a precursor to the capsid protein, a RNAdependent DNA polymerase, an inclusion body protein, and a polypeptide not essential for infection, respectively [91. Caulimoviruses (such as CaMV) and other pararetroviruses (hepadna- and badnaviruses), like retrovi-
*This article is dedicmed to the memory of professor Leon Hirth. It was omitted by mistake from the two issues of Biochimie (75 (7/8)) dedicated to prof Hirth. **Present address: Department of Plant Pathology, Oklahoma State University, Stiliwater, OK 74078, USA ***Correspondence and reprints
ruses, employ reverse transcription in their replication cycles [10]. Retroviruses package two molecules of genomic RNA. On infection, the RNA is reverse transcribed to a DNA molecule that integrates in the host genome. The integrated proviral DNA is the template for transcription of genomic RNA. On the other hand, pararetroviruses package double-stranded DNA and integration of the DNA into that of the host is not obligatory for replication. Rather, the DNA resides in nuclei as closed-circular minichromosomes from which genome-length transcripts are produced. Their reverse transcription produces the packaged doublestranded DNA. Mutation rates approaching 10-2 substitutions per site per year occur in retroviruses [ 11 ]. The high rates likely are due to the absence of proofreading functions for reverse transcriptases. However, the estimated mutation rates for pararetrovicuses [8, 12] are one to two orders of magnitude lower than those of retroviruses. Four to 6 x 10~ substitutions per site per plant growth cycle occur in CaMV [8]. The lower than expected (based on its retrovirus-like life cycle) rate of sequence change for CaMV is supported by the observation that its intercellular transport protein appears less diverged from a common ancestor than similar proteins encoded by other plant viruses [13]. Since 1980, the complete nt sequences of several other CaMV isolates have been reported [14-16]. With the recent sequencing of the DNAs of three additional isolates [17-19], sufficient sequence infor-
Table I. Substitutions in the 8 kb DNA of CaMV isolates relative to a consensus DNA sequence. Isolate
No of substitutions
BBC Cabbage B-JI Cabbage S CM1841 CM4-184 CMV-I D/H NY8153 XinJing
119 155 236 139 129 154 236 136 325
Results Base substitution profile
Table !!. Substitution frequencies of nt in the (+) DNA strands of CaMV isolates relative to a consensus DNA sequence. Values are the means (± one standard deviation) × 103, for eight isolates, of the number of each type of substitution divided by the number of nt of the consensus type in the consensus sequence. Nt in t'ottsettstes
Nt in isolates A
A
G
C
8.4±2.4
3.7±1.3
4.0±2.0
2.6±2.0
3.3±1.3
G
17.0±7.2
C
5.3±3.6
2.4±1.8
T
5.4±3.8
2.7±1.6
sequences (excluding isolate CM4-184). To prevent overscoring of multiple residue gaps, all gaps were converted to a single-po~;ition gap with the remaining gap positions filled in from the consensus sequence. At each position in the alignment a similarity score was calculated, using tile scoring rules cited above, for all possible pairs of isolates. The scores for 50 contiguous positions were summed to give a similarity score for that window. The process was repeated every 25 residues and the summed scores plotted as a function of the position of the first residue in the window.
22.5±8.9 16.8±6.5
mation is available to examine nt sequence changes for patterns that may lead to discovery of mechanisms governing the lower than expected rate of change.
Materials and methods The complete nt sequences of CaMV isolates BBC, Cabbage S, CM4-184, CM1841, CMV-I, D/H, NY8153, (GenBank/EMBL Accession nos M90542, MI0385, J02046, M90543. J02047, M90541, respectively). Cabbage B-JI (J Stanley, personal communication) and XinJing [16] were aligned with the aid of gap translation [ 131 to optimall~ position gaps. Optimal positioning was scored using a matrix in which identities, transitions, and transversions or gaps, were assigned values of 2, I, and 0, respectively. The alignment was used to construct a consensus sequence, one residue at a time. by visual inspection. The nt present at a given position in the majority of the sequences was chosen for the consensus. When two nt were equally represented we chose that nt not present in CM4-184 because of that isolate's close sequence similarity to CMI841 [201, The CaMV consensus sequence was used as a reference by which to identify and characterize isolate-specific base substitutions, i,lsertions and deletions. Both the number of occurrences of each type of substitution and their frequencies (number of occurrences/fractional base composition) were considered. To test for differences in sequence variability along the CaMV genome, a similarity plot was generated from the aligned
The aligned nt sequences of isolates whose complete sequences were available were used to calculate a C a M V DNA consensus sequence. The base composition of its positive strand was 37% A, 19% G, 23% T, and 21% C, not significantly different from the composition of the DNA of each isolate. In the sequence of each isolate, any nt that differed from those of the consensus at the corresponding position was designated a substitution. This designation is not meant to imply any directionality, since the consensus may not be the ancestral sequence of these DNAs. Of 8030 consensus sequence positions, 997 had substitutions in at least one isolate, Individual isolates had i 19 to 325 substitutions relative to the consensus sequence (table I). For each base type, the numbers of changes from that base in the consensus was approximately equai to the numbers of changes to that base in the consensus, suggesting that the C a M V D N A plus strand was stable in base composition. Two anomalies were noted when the kinds of substitutions were examined. Among isolates, the ratios of the number of transitions to the n u m b e r of transversions varied from 1.2 to 3.0 and averaged 2.1 + 0.3. For the consensus bases G, C, and T, the ratio of transition to transversion frequencies varied from 2.1 to 2.9 (table II). However, for consensus A bases the frequencies were approximately equal (8.4 and 7.7 x 10-3 nt-t) and similar to the transversion frequencies at other bases. That the frequency of transitions from A n t in the consensus (+) strand was half the frequency of transitions from T nt in that strand implies that the mechanism that generated these transitions acted preferentially on one of the two strands of CaMV DNA. The second anomaly found was in the purine preference of transversions. The mean n u m b e r of transversions involving A for each isolate predominated over transversions involving G by the ratio of 2.9 + 0.4. A ratio of 1.9 was expected from the ratio of A to G in the base composition. Transversion frequencies (table II) suggested that the excess of A <--> C and A <--> T transversions may have been due to higher fre-
quencies of transversion to A than of transversion from A. Yet, the large variation in frequencies among isolates precluded a statistically sound conclusion, as did the non-equivalence of the consensus sequence with the common ancestor sequence. Phylogenetic analysis (Chenault and Melcher, in preparation) suggested that the North American isolates used in this study have a monophyletic origin. Since the majority of sequences used in constructing the consensus sequence were North American isolates, the consensus sequence was a good approximation of the North American ancestor sequence. Thus comparison of the proportion of A's changing to a pyrimidine with that of pyrimidines changing to A's among North American isolates should reveal the direction of the mutations causing an excess of transversions involving A's. In these isolates, the proportion of consensus A n t changing to a pyrimidine was twice the proportion of pyrimidines mutating to A's. The difference was significant (P < 0.05) by the t-test. The proportion of pyrimidines changing to A resembled the transversion frequencies involving G's. Transient template misalignment has been suggested as an important source of retroviral RNA sequence variation 121. 221. For such variations, the 3' neighboring nt is identical to the base resulting from the substitution (ie the sequence ATTGC would become ATTCC I231). Nt neighboring isolate-specific base substitutions on the plus and minus DNA strands were examined. An average of 28.5% of the base substitutions in each isolate occurred next to nt identical to the substituted nt. Identity of the neighboring nt was expected for 27% of substitutions based on the nt
i__ 0
. 50(]0
~
| 7500
Sequence position
Fig 1. Open reading frame dependence of the interisolate similarity of CaMV nt sequence. The summed similarity scores for all possible pairs of eight isolates for 50 contiguous positions were plotted as a function of the position of the first residue in the window. Windows were evaluated every 25 residues. Score for complete identity is 6400. Boxes above the plot identify positions of open reading frames. Asterisks identify hypervariable regions discussed in text.
Table HI. Silent substitution frequencies in CaMV coding and non-coding regions. Substitutions were tallied relative to the consensus sequence. Values are mean __.one standard deviation of values for each of eight isolates.
Region
Silent substitutions (% of total)
Density of silent substitutions (substitutions/kb)
75 + 14 69+ 18 79 + 10 75 + 12 90 + 6 54 + ! 1 100
14 + 4 10+3 14 + 4 22 + 8 20 + 7 14 + 8 17 + 9
ORFI ORF2 ORF3 ORF4 ORF5 ORF6 Intergenic
composition of the consensus sequence. Therefore, no evidence supportive of transient template misalignment producing substitutions during CaMV DNA replication was found.
ORF-specifit' variation Two approaches were used to examine dependence of sequence variation in CaMV DNA on map position. In the first, similarity scores for a moving window of nt were plotted at 25 nt intervals as a function of genome position (fig I). ORFs I, 2, 3, and 5 and the intergenic region were the least variable regions. ORF 4 was slightly more variable and ORF 6 was the most variable. The ORF 6 nt sequences contained two hypervariable regions which corresponded in position with those noted for the predicted amino acid sequences of ORF 6 [24]. In the second approach, substitutions were classified as either silent or expressed (table III) in each ORE All ORFs contained approximately the same number of changes per kb, though ORF 2 had r,ignificantly fewer than ORF 4. The majority of substitutions in each ORF were silent. The proportion of silent substitutions were similar in ORFs 1 to 4, while the proportion in ORF 5 was significantly higher, and in ORF 6 considerably lower. ORFs 4 and 5 had the highest densities of silent substitutions. Those densities were significantly higher than that observed for ORF 2 and were such that about 26% of silent positions had substitutions in at least one isolate. The mean substitutions/isolate for ORFs 4 and 6 were similar. However, since those in ORF 4 tended to be at the same positions in several isolates while those in ORF 6 were scattered, the variation measured by the plotting algorithm (fig 1) was greater for ORF 6.
hlsertions and deletions The CaMV DNA consensus sequence was used to identify insertions and deletions (table IV) in nt
sequences of the isolates. Only insertions were found in CMI841, while both insertions and deletions were found in all other isolates analyzed. When counting insertions and deletions, those shared by more than one isolate were considered a single event. There was a slight excess of insertions ( 15 events) over deletions (12 events). Insertions ranged from 1 to 42 nt in length, while deletions varied from 1 to 421 nt. No insertions or deletions were found in ORFs 1 and 3. The only deletion in ORF 2 was that of CM4-184 that removed 88% of the gene [251. ORF 4 contained half of the ORF-specific insertion/deletions, a few were found at the C-terminal part of ORF 6, and two were found in ORF 5. The large intergenic region had the greatest proportion of total insertions and deletions, Table IV. CaMV isolate-specific insertions and deletions relative to consensus DNA sequence. Positions are indicated as the first nt of the insertion or deletion relative to the Cabbage S sequence or, when that nt is not present in the Cabbage S sequence, the nt immediately preceding. Isolate abbreviations used: DH, D/H; XJ, XinJing; BJI, Cabbage BJl: CS, Cabbage S; NY, NY8153; CMI, CM1841; CM4, CM4-184: CV, CMV-I. Regions identified in parentheses are ORFs (numerals) or intergenic regions (numerals followed by' IG').
Type Position (region) Insertions 304 (7) 1345 ( I-2 IG) 2405 (4) 2437 (4) 3344 (4) 3634 (4) 3668 (4. 5) 5720 (5-6 IG) 7265 (6) 7304 (6) 7316 (6) 7478 (6-7 IG) 7482 (6-7 IG) 7499 (6-7 IG) 7967 (6-7 IG) Deletions ! 346 ( 1-2 IG) 1388 (2) 2543 (4) 4172 (5) 7363 (6-7 IG) 7471 (6-7 IG) 7481 a (6-7 IG) 748 lc (6-7 IG) 7481d (6-7 IG) 7767 (6-7 IG) 7990 (6-7 IG) 8022 (6-7 IG)
Length
44%. Of all isolates considered, isolate Xirding had the nt sequence with the most insertion/deletion events. Nineteen percem of all insertion/deletion events were shared between isolates Xinling and D/H. The majority of insertion/deletion events (63%) may be attributed to transient template misalignment by the polymerase either at homooligomeric stretches or at regions of direct repeats. Of the remaining events, four could possibly be deletions consistent with transient template rnisalignment. Of the six unexplained events, four were in the XinJing sequence.
Secondary structure of intergenic region Base substitutions in the intergenic region were classified by their location in base-paired regions or non-paired regions in the model of Fiitterer et al [2]. Among 331 positions in potential base-paired regions, 6.3% had a substitution relative to the consensus nt in at least one isolate. Of these substitutions, only one pair of positions [2] covaried to maintain base pairing (either a G-C or a C-G pair). Five additional substitutions were also not disruptive of secondary structure since they produced G-U pairs. The remaining 14 were disruptive of the proposed pairing. Of the 188 positions not predicted to be paired, 16.5% had a'substitution in at least one isolate. All insertions/deletions occurred in non-paired regions.
Isolates Discussion
I !
3 42 3 3 6 1
6 3 3 2 1 1
2 5 421 3 21 1 1
9 1 1 I 1 I
DH, XJ BJI DH, CS, CV XJ NY, CV, BBC, CS NY, BJI, CM 1, CM4 DH, XJ CM 1, CM4 DH, XJ XJ XJ XJ BJI BJI BJI, BBC DH, XJ CM4 NY, CV, BJI DH, XJ CM4 DH CS XJ DH XJ BJI BJI
The size of CaMV genomes, other than that of CM4184, was maintained close to 8.0 kb by a balance of insertions and deletions in the intergenic region and ORFs 4, 5 and 6. Other coding regions did not change size. Natural selection for functionally important residues retards the accumulation of sequence variation. In the intergenic region, consistent with results based on fewer sequences [2], selection resulted in a more than two-fold reduction in the number of changes in putatively base-paired regions relative to non-base paired regions and the absence of insertions/deletions in base-paired regions. The failure to detect extensive covariation of substitutions in base-paired regions suggests that though this region is likely extensively folded, the precise positions of base-paired regions is not critical. In coding regions, selection resulted in more than half the substitutions producing no change in coding and the absence of insertions or deletions in many coding regions. Patterns of change in ORFs I, 2 and 3 were similar, suggesting that they were under similar selection. Distinctive selections may govern ORFs 4, 5 and 6 whose patterns of changes were unique and different from the remaining regions. ORF 4 had more variation in coding positions than ORFs 1, 2 and 3 and had the most insertions/deletions. These fea-
tures are consistent with known variation in surfaceexposed residues of capsid proteins of other viruses. Because ORF 5 had the lowest density of substitutions causing coding changes, strict preservation of the amino acid sequence of the viral reverse transcriptase may be important for CalVlV propagation. ORF 6, however, had about as many coding as non-coding changes. Evolutionary constraints may not be as stringent for ORF 6. Yet the product of ORF 6 is a determinant of host-range for CalVlV [26--28] a function that may lead to host-specific constraints on its sequence. Part of the variability of ORF 6 is concentrated in two hypervariable regions ([24] and fig l). Point mutations in these hypervariable regions alter hostspecific interactions of the virus [28]. The variation in ORF 6 among isolates collected from the same host genus may reflect differing abilities to infect other hosts. ORF 6 apparently shares three features with the env genes of retroviruses: genomic position downstream of the pol gene [10]; highest variability among coding regions of their respective viruses; and a major role in determining host-virus compatibility [29]. When only silent substitutions were considered, two halves of the CaMV genome differing in substitution density were evident, The density of silent substitutions was lower in the half of the genome containing ORFs I, 2, 3 and 6 and higher in the half with ORF 4 and 5. The two halves may differ in mutability, or the lower density region may be under a selective constraint other than that imposed on the encoded amino acid sequence. Other selective constraints are unlikely since silent substitutions in ORFs are as frequent as substitutions in loops and bulges of the intergenic region. The ORF 3-4 boundary between the two substitution density halves corresponds to a region that appears to be a hot spot for recombination [30] and recombination cross-over points at the junction of ORF 5 and 6 have been observed [30]. Lentiviral RNAs are distinguished from those of other retroviruses by their high A content (34 to 38%) [31 ]. The A content of the CaMV (+) strand (37 %) falls in that range. Since the T content is only 23%, there is an asymmetry in strand base composition. This may be due to the coding requirements of the plus strand, to the need to package CaMV DNA in an icosahedral particle, or to the error spectrum of CaMV DNA replication processes. Stretches of oligoA. oligoT introduce kinks into double-stranded DNA [32]. Preservation of a kink pattern compatible with packaging may be a selective force that maintains the strand asymmetry. A priori to maintain an A-rich strand, the frequency of mutation at A's must be suppressed and/or the frequency of mutation to A's must be enhanced. No enhancement of mutation to A's was noted in CaMV DNAs. On the contrary, we observed an enhanced frequency of transversion of A residues
to pyrimidines. Unbalanced, such transversions would make the strand A-deficient. Yet, because of the lower rate of transversions relative to transitions, this enhancement made only a minor contribution to the base composition. The main contribution to the A-richness must be due to a suppression of A to G transition frequency. This suppression could be due to selection acting on the encoded amino acid sequences or to a property of the polymerase. Both nuclear DNA replication of minichcomosomes, and the linked processes of transcription and reverse transcription, could potentially produce sequence variants. That the variation was principally due to transcription-reverse transcription is suggested by the observation that differences in substitution frequencies were not strand symmetric. The suppression of the frequency of consensus A's replaced by G's was not mirrored by a similar suppression of consensus T's replaced by C's, and the excess of transversions involving A's was not mirrored by an excess of transversions involving T's. Strand asymmetry in substitution frequency is expected for replication mechanisms that use transcription which produces only a (+) sense nucleic acid strand and reverse transcription of the RNA which produces only (-) strand DNA. Because different processes are involved in (+) and (-) strand synthesis, errors of incorporation should be strand asymmetric. In contrast, nuclear DNA replication should produce similar errors when copying each of the parental strands. Some features of CaMV substitution frequencies were unlike those of retroviruses. CaMV transversions involving A dominated over transversions involving G. In contrast, excesses of G to T transversions are created by HIV-1 [22], avian myeloblastosis virus, and Moloney murine leukemia virus reverse transcriptases [21]. HIV-I reverse transcriptase produces an excess of G to A transitions [33]. In addition, a large excess of G 6-> A transitions are found in base substitution profiles constructed for HIV-1 [29]. The CaMV base substitution profile did not have an excess of G <--->A transitions. For HIV-1, transient template misalignment was suggested responsible for the excess of G to T transversions and for the excess of G ~-> A transitions [33]. We found no evidence of transient template misalignment for CaMV DNA substitutions, though misalignment played a role in insertions and deletions. Most of the latter occurred at stretches of the same nt (ie an oligo(A) stretch), or at regions of direct repeats. Thus, though the CaMV reverse transcriptase appears to be the major cause of sequence variation in CaMV DNAs, it may commit different errors than do retroviral reverse transcriptases. Purified retroviral reverse transcriptases do exhibit varying misincorporation specificities and varying propensities for mismatch extension [34].
That the error-prone processes of transcription and reverse transcription appear important in CaMV variation appears inconsistent with the estimated lower rate of substitution in the CaMV genome (4 to 6 x 10 4 substitutions per site per passage) [8] relative to retroviral genomes (10-2 to 10-3 substitutions per site per year) [ 11 ]. Several factors could contribute to this difference. First, CaMV reverse transcriptase may not only commit errors different from those committed by retroviral reverse transcriptases, but also may commit them less frequently. Second, an error correction mechanism may operate on CaMV DNA. Gene conversion may be such a mechanism I35], but early evidence for its importance was not substantiated [301. Third, the presence in infected plant cells of multiple minichromosomes instead of a single integrated proviral DNA may effectively raise the population size and thus reduce the bottleneck effect. Fourth, in the plant, the majority of CaMV amplification may occur by DNA replication of minichromosomes, and not by reverse transcription. CaMV has been shown to spread through the plant via the phloem tissue [36]. Once in the phloem of the plant, CaMV may reach the actively dividing cells of young leaves. Thas, CaMV minichromosomes could spread throughout the plant by nuclear DNA replication and simple cell division. This could limit the number of reverse transcription cycles during spread of CaMV infection in the plant. One or more of these explanations could account for the observed CaMV basesubstitution profile and the lower estimated CaMV mutation rate. Acknowledgments T h i s w o r k w a s s u p p o r t e d b y the R o b e r t G l e n n R a p p F o u n dation, the O k l a h o m a Health R e s e a r c h P r o g r a m , a n d t h e O k l a h o m a A g r i c u l t u r a l E x p e r i m e n t S t a t i o n o f w h i c h this is J o u r n a l A r t i c l e N o J-6368.
References I Franck A, Guilley H, Jonard G, Richards K, Hirth L (1980) Nucleotide sequence of cauliflower mosaic virus DNA, Ce// 21, 285-294 2 FtRterer J, Gordon K, Bonneville JM, SanfiNon H, Pisan B, Penswick J, Hahn T (1988) The leading sequence of caulimovirus large RNA can be tbldud into a large stem-loop structure. Nu~'leic Acids Res 16, 8377-8390 3 Melchcr U (1988) A readable and space-efficiem DNA sequence representation: application Io caulimoviral DNAs, Comlmt Appl Bios('i 4, 93-96 4 Futtercr J, Gordon K, Sanlaqon H, Bonneville JM, Hahn T (1990) Positive and negative c~mtro[ of translation by the lead~-r sequenl:e of caalillower mosaic virus pregcnomic 35S RNA. EMBO J 9, 1697-17(ff 5 Howell SH, Walker LL, Walden RM (1981) Rescue of in viwo generated mutants of cloned caulillowcr mosaic vin~s genomes in inl~'ctcd plants, Nature 293,483--486 6 Di.x~n LK. Kt~cnig l, Hahn T ti983) Mutagenes~s ol caulittower mosaic virus, Gene 25, 189-199 7 Dauber! S, Shepherd RJ, Gardner RC t 1983) insertional mutagenesis of the cauliflower mosaic virus genome, Gem. 25, 201-208 8 Pennington R, Melcher U (!~t)3) In phmtu tldetiun of DNA inserts from the large intergenic region ofcau[illower mosaic virus DNA, I/il~do,~,y 192o 188-196
9 Covey SN (1991) Pathogenesis of a plant pararetrovirus: CaMV. Semin Viral 2, 151-159 10 Hahn 1", Futterer J (1991) Pararetrovimses and retroviruses: a comparison of expression strategies. Semin Viral 2, 55-69 ! I Gojobori T (1990) Molecular clock of viral evolution, and the neutral theory, Proc Nail Acad Sci USA 87. 10015--10018 12 Girones R. Miller RH 0989) Mutation rate of the hepadnavims genome. Virology 170. 595-597 13 Melcher U (1990) Similarities between putative transport proteins of plant viruses. J Gen Virol 7 I. 1009-1018 14 Gardner RC. Howarth AJ, Hahn P. Brown-Lnedi M. Shepherd RJ, Messing J (198 I) 3he complete nucleotide sequence of an infectious clone of cauliflower mosaic virus by MI3mp7 shotgun sequencing. Nucleic Acids Res 9. 2872-2888 15 BaiLs E, Guiiley H, Jonard G. Richards K (1982) Nucleotide sequence of DNA from an altered-virulence isolate D/H of the cauliflower mosaic virus. Gene 19. 239-249 16 Rongxiang F, Xiaojun W, Ming B, Yingchuan 1", Faxing C, Keqiang M ~1985) Complete nucleotide sequence of cauliflower mosaic virus (Xirding i~late) genomic DNA. Chin J Viral I. 247-256 17 Chenault KD. Steffens DL. Melcher U (1992) Nucleolide sequence of cauliflov,'e~mosaic virus isolate NYB153. Plant Physiol !00, 542-545 18 Chenault KD. Melcher U (1993) Cauliflower mosaic virus isolate CMV-I. Plant Physiol. 101.1395-1396 19 Chenault KD. Melcher U (1993) The complete nucleotide sequence of cauliflower mosaic virus isolate BBC. Gene 123. 255-257 20 Dixon L. Nyfl~negger T. Delley G, Martinez-lzquierdo J, Hahn T (1986) Evidence for replicative recombination in cauliflower mosaic virus, Vm~lo,gy 150. 463-468 21 Bebenek K. Abbotts I. Roberts JD. Wilson SH. Kunkel TA (1989) Specificity anti mechanism of error.prone replication by human immunodefici~:acy vires1 reverse tran~ripta~. J Biol Chum 264, 16948-16956 22 Roberts JA, Preston 8D. Johnston LA. Soni A, Loeb LA, Kunkcl TA ~19go)) Fidelity of two retroviral reverse tran~dptases during DNA~dependent DNA synthesis in riu~. Mol Cell Biol 9. 469-476 23 Kunkel T, Soni A (1988) Mutagenesis by transient misalignment. J Biol Chum 263. 14784-14789 24 Sanger M. Daubert S. Goodman RM (1991) The regions of ~'quence varialion in caulimovirus gene VI. Vir.logy 182, 830-8M 25 tlowarth M, Gardner RC, Messing J, Shepherd RJ O981) Nucleotide sequence of naturally occurring deletion mutants of cauliflower mosaic virus. Vii nh~gy 112, 678-685 26 Schoelz J, Shepherd RL Daubert S t 1986) Region Vl of cauliflower mosaic virus encodes a host ~,mgedetenninanL Moi Cell Biol 6, 2632-2637 27 Schoeiz JE. Shepherd RJ, Daubert SD (1987) Host response to cauliflower mosaic virus (CaMV) in solanaceous plants is determined by a 496 bp DNA sequence within gene VL In: Molecnlar Strate,~ies for Cn~p Prowction (Arutzen CJ, Ryan CA, eds) Alan R Liss, New York. 253--26S 28 i~aubert S, Routh G (1990) Point mutations in cauliflower mosaic virus gene Vl confer host-specific symptom changes. Mol Plato-Microbe Interact 3. 341-345 29 Shimizu N, Okamoto 1", Moriyama EN, Takcuchi Y. Gojobori T. H~shino H (1989) Patterns of nucleotide sub, italians and implications for the immunological diversity of human immunodeficiencyvirus. FEBSLen 250, 591-595 30 Vaden VR, Melcher U (1990) Recombination sites in cauliflower mosaic virus DNAs: implications for mechanisms of recombination. Vimlo,cy 1"/7.717~726 31 Wain-Hobson S (1992) Human immunodeficiency virus type l quasispecies in viva and (.a"vi~'o, Curr Top Mi('mhiol Immnmd 176, 181-19,t 32 Widom J (198S) Bent DNA for gent regulation and DNA packaging, BioEs, ,~ays2. 11-14 33 Varlaniah ]P, Meyerhans A, ,~p3 B, Wain-Hobson S ( I t ~ l ) Selection, recombination, and G -->A hypemmtation of human immunodeficiencyvirus type I genomes, J Vit~J165, 1779-1788 34 W~lliams KL Loeb LA (1992) Retroviml reverse transcriplases: error frequencies and mutagenesis, Cnrr Top Microbial lnmmnol 176, 16.5-180 35 Choe IS, Melcher U, Richards K, Lebeurier G, Essenberg RC (1985) Recombination between mutant cauliflower mosaic virus DNAs, Plant Mol Biol 5. 281-289 36 Leisner SM, Turgcon R, Howell SH (1992) Long distance movement of cauliflower mosaic virus in infected turnip plants, Mol Plant.Microbe Inwract 5, 41-47