VIROLOGY
178,46
1 468
(1990)
The Nucleotide Sequence of Abutilon Mosaic Virus Reveals Prokatyotic as Well as Eukaryotic Features T. FRISCHMUTH,’ lnstitut
fijr Allgemeine
Botarxk, D-2000 Received
G. ZIMMAT,
Angewandte Molekularbiologte Hamburg 52, Federal Republic October
H. JESKE’
AND
24, 1989; accepted
der Pfianzen. of Germany May
Ohnhorststrasse
18,
23, 1990
The complete nucleotide sequence of abutilon mosaic virus (West Indian isolate, AbMV.1 is presented. The resulting genomic structure resembles that of other geminiviruses which are transmitted by the whitefly Bemisia tabaci: AbMV possesses a bipartite circular genome with bidirectional orientation of the open reading frames (ORF). Both components have a common region of 180 bases with 99% homology while the rest of their sequence is distinct. Eukaryotic regulatory transcription elements precede most ORFs and polyadenylation signals are present at the end of most ORFs. However, two ORFs show features of prokaryotic genes. This chimaeric genome organisation is discussed with reference to the finding that AbMV DNA is present in plastids as well as in the nucleus of infected cells. 6:) 1990Academic Press. Inc
INTRODUCTION
a/., 1988b). Full-length clones of DNA A (pAbMV, 100) and DNA B (pAbMV, 200) were obtained by insertion into the unique Pstl and Sac1 sites, respectively. To analyse the insertion sites subclones of viral DNA which extend over the fstl and Sac1 sites were isolated from infected plants. Subclones of the full-length clones were produced to reveal overlapping sequences with different restriction enzymes (Fig. 1). Cloning was performed under German safety guidelines L2/Bl according to the licence of the ZKBS 1526/l.
Abutilon mosaic virus (AbMV) is a member of the geminivirus group which is characterized by singlestranded circular DNA and a twin particle morphology (Harrison, 1985; Abouzid and Jeske, 1986). It possesses a bipartite genome as do other geminiviruses which are transmitted by the whitefly Bemisia tabaci (Abouzid el al., 1988b). AbMV particles were localized in the nuclei of bundle area cells of infected plants (Abouzid eta/., 1988a). A replicative intermediate was purified from a chromatin-like structure (Abouzid et al., 198813). However, viral DNA was found in the plastids (GrBning et a/., 1987, 1990). The localization of viral DNA in plastids as well as in nuclei should require an adaption to the completely different genomic organisation between a eukaryotic and prokaryotic-like environment. We asked whether this is reflected in structural elements of the viral genome. To answerthis question the complete genome of AbMV was sequenced and its composition analysed. MATERIALS
AND
DNA
WC----t--
METHODS
-c__ __)__c
innes
for reprints
---
-0 --
b
AbMV, DNA was isolated from Abutilon sellovianum var. marmorata REGEL and cloned into M 13mp8 and M 13mpl9 vectors as previously described (Abouzid et John
--
----
Plants, virus, and clones
’ Present address: NR4 /UH, U K. ’ To whom requests
A
Institute, should
Colney
Lane,
; ”
H
85
K
-
BH
HSa
1000
s
h
0.3 H ,obo
1
FIG. 1. Nucleotlde sequence strategy of the AbMV, DNA. Fulllength clones were subcloned (arrows) wrth the lndlcated restrictlon enzymes and sequenced with the chatn termlnatton technlquc accordtng to Sanger et a/. (1977): B, BarnHI. Bg, Bglll. E, EcoRI. H,
Norwich
/@all; Ha, Haelll; Sall, Sa, Sacl.
be addressed.
461
Hc,
Hincll;
Hd,
H/ndlll,
0042-6822/90
K, Kpnl.
P, Pvul;
Ps, Pstl;
$3.00
Copyrsght (c 1990 by Amdermc Press, Inr 411 rlyhfs of rerm,luctmn I” any form reserved
S,
462
FRISCHMUTH. 10
a
20
30
40
50
ZIMMAT. 60
I
AND
61 CAAAACTTGC
TCATGTAATT GGAGTATTGG AGGTCTTTAT ATACTAGAAC
TCTCATTAAC
10
b
I I I I I 1 CGCTGGCATT TATGTAATAA GAAGGGGTAC TCTGGATGAG TTACTCCACT TGAGGCTCCT
JESKE 20
30
40
50
I
60
I
I
I I I 1 CGGTGGCGTT TTTGTAATAA GMGGGGTAC TCTGGATGAG TTACTCCACT TGAGGCTCCT 61 CAAAACTTGC
TCATGTAATT GGAGTATTGG AGGTCTTTAT ATACTAGAAC
TCTCATTAAC
121
GGATTTGCAA CACGTGGCGG CCATCCGCTA
CGCTTTTTCA ACCTTTAATT TAGAATTAAA GGTAGTCCAT
181
CCCCTGGTGC TGCTCTCGCA CTCGCTCTAC CCTGGTGCTC TTCTCACACG CGCTCTCCCA
TGCGCCTGAC GAGTCAATAT AATTTGAACA ACTTGTAGCG
241
TTGGTGCGGG TCCTTCACGC TCCTCTTTTG GCTGGACCTT TAATTTGAAT TAAAGGTGTT
CTAAGTTGTT GGGTTGTCTA TAAATGAAAG CCATTGGCCC ACGAGCTTTA ACCCAAAATA
301
TACTTTCTCG TGCGACGTGC T T T A T A G T T T GAATTGTTGT CGCGCGAATA CTGGCTATGG
361
CCTAAGCGCG ATCTCCCATG GCGATCGATG CCTGGAACAT CAAAGACTAG
TCGCAACGCT
361
ACCATTGTAC
421
AATTATTCTC CTCGTGCTCG TATTGGGCCA AGAGTTGACA AGGCCTCTGA ATGGGTGCAC
421
TTAATGGTGG ACCATCTAAC
481
AGGCCCATGT ACAGGAAGCC CAGGATCTAC
481
CAGCAATCCC
541
GGCTGTGAAG GGCCTTGTAA GGTCCAGTCG TATGAACAGC GTCATGACAT CTCACATGTT
541
TCTGATAATG TACCCGTCTA
601
GGCAAGGTAA TGTGCATCTC TGATGTGACA CGTGGTAATG GCATTACCCA
601
TTCACGCAAT
CATGTGTGGA AACGGCCAAC
TGCTGCGAAG AGACATGACT GGAAGCGTCG
661
AAGCGTTTCT GTGTCAAGTC T G T G T A T A T T TTAGGGAAGA TTTGGATGGA CGAGAACATC
661
ACCTTCAAAT
ACGAGCAAGC
GCCCAAGATG TCAGCCCAAC
721
AAGCTCCAGA
721
GAATCAGTAT GGGCCTGAGT TTGTAATGGC CCAGAACTCA
GCGATCTCGT CGTTCATCAG
121
GGATTTGCAA CACGTGGCGG CCATCCGCTA
181
TGGTGCCCGT ACATCCCGCG
241
GGCGCCTCGT CCAATAATAA
301
ACCACACGAA
TAATATTACC GGATGGCCGC GCGACCCCCC
CGGACGCTGA GGACGGCCGA TGTGCCCAGA CCGTGTTGGT
CAGTGTCATG TTCTGGTTGG TCCGAGACCG TAGACCGTAT
TAATATTACC GGATGGCCGC GCGACCCCCC
CTGGTCAAGG ACGTGTCATA ATTGGACCTT GCTTCTGAGT CTATTTGCGA T A T A T A T T T G GCCGGATAGG T A C T T T T G T G GTAAAAGACT
CCACGTTTTT TCGCACCTGG CTAATCGTGT T G T T T T T T C T TATATTATCA GGAATAAACG TGGATCCTAC TTTAATCAAC GTCGACAGTA CCAACGACGA
GCATACATGA
781
GGCACGCCCA TGGATTTCGG TCATGTGTTC AACATGTTCG ACAACGAGCC
CAGTACTGCC
781
TTATCCTGAC TTGGGTAGGT CCGAGCCCAA
CCGAAGCAGG
TCCTATATTA GGTTGAAGCA
841
ACTGTGAAGA ACGATCTCCG
TGGCAAGGTC
841
ACTACGCTTC
AAAGGGACGG TGMGATTGA
ACAGGTACCA
TTTGCGATGA ACATGGACGG
901
ACAGGTGGAC AGTATGCCAG CAATGAACAG
GCAATCGTCA AGCGTTTCTG GAAGGTCAAC
901
ATCTACCCCC
AAAGTGGAAG GAGTGTTCTC CCTTGTCATT GTTGTGGATC GTAAACCACA
961
AATCATGTGG TCTACAATCA
CGATCGTTAC CAGGTCCTTC ACAAGTTCTA
GGAGAACGCT
961
CCTTGGCCCG TCTGGTTGTC TGCATACATT TGACGAGCTG TTTGGTGCTA GGATACACAG
1021
CTACTATTGT ATATGGCATG TACTCATGCT TCTAACCCCG
TTTATGCAAC TTTGAAGATC
1021
TCATGGTAAC CTCAGCGTAA CCCCCGCGTT GAAAGATCGA TATTACATTC GCCACGTTTG
1081
CGAATCTATT TCTATGATTC GCTCATGAAT TAATAAAATT T G A A T T T T A T TGAATGATTT
1081
CAAACGTGTG CTATCTGTGG AGAAGGATAC GCTGATGGTA GACGTGGAAG GATCCATTCC
1141
TCCAGTACAT AATTTACATA
C G T T C T G T T T GTCGCAAACT GAACAGCTCT AATTACATTG
1141
TCTCTCTAAC CGGCGTATTA ATTGTTGGGC CACGTTTAAG GACGTGGATC GTGAGTCATG
1201 1261
TTAATGGAAA TCACGCCTAA AAATAAATTG ACCCAGAAGC
CTGATCTAAG TACATGTTGA CTAAACGCCT AAATCTATTT TGTCATCGAA GTCGTCCAAA CTTGGAAGTT CAGGTAAGCT
1201
TAAGGGTGTT TATGATAACA TAAGTAAGAA CGCCCTGTTA CTTTATTACT GCTGGATGTC
1261
GGATACGCCT GCGAAGCATC CACTTTTGTA TCGTTTGACC TTGACTATAT TGGTTAAGTG
1321
TTGTGGAGAT GCAACGCTCT CCTCAGGTTG TGGTTGAACC GTATTTGTAC ATGGTATATC
1321
AATAAATGAA
1381 1441
CTCGTTCTGG TGTATAGCGG GTCCTCTACT TTGTATATCC TGAAATACAG GGGATTTGCT ATCTCCCAGA TATACACGCC ATTCTCCGCC TGATGTACAG TGATGAATTC CCCTGTGCGT
1381
CAATGATTTG GCTTGAGAAG CCTGACAGTT ACTATTGACA CATTCTTGGA CCGCTGTCCT
1441
GACTAATTCG TTCAACTGGC CCATTGACAT TGTGATGTTG GCGTCCGCTC TCTGGTCACC
1501 1561
GAATCCATGT CCTGTACAGT CTATGTGGAA GTAGATGGAG CACCCGCACT GCAGATCAAT CCTCCGCCGC CTAATGGCCC TCCTCTTGGC CTGCCTGTGT GCCTTCTTGA TAGAGGGGGG
1501
CACAATAGAA
1561
TATGGATGGA GCTCCATTTC TGAGTCCGTA TCTGGCTGGG CTCTCCCTAT GGTGCTCCTG
1621 1681
CTGTGAGGGT GATGAAGATC GCATTCTTGA GAGTCCAGTT GCGTACACCT CTATTTTCCT CTTTGTCCAG GTACTCTTTA TAGCTGGAAC CCTCACCAGG ATTGCAAAGC ACGATTGCTC
1621
GAAGCCCATG ACTTCACCAG
1681
ATGGACCGCG CATTCCTGAT GGGCTTCCTC TCCCATTTCC CATAATCCAC
1741
GGATTCCTCC T T T A A T T T G A ACCGGCTTGG CTAACTTGCA ATTTGACTGC CAGTCTTTTT
1741
TCCACGTCTT TTTCGGAGAA CTGTTTGGAC AGTATCCTTA CTGTTGGTGC CCGGAACGAT
1801
GGGCCCCCAG CAATTCTTTC CAGTGCTTTA GCTTTAGATA GTGCGGTGCG ACCTCATCAA
1801
GTCGACTGAG TGTTTCGCCG TGGACAATTT CAGTTTCCCT TTGAACTTGG CGAAGTGGGT
1861
TGACGTTATA CTCCACTTCG TTCGAATAGA CGCGACCATT GAAGTCCAGG TGTCCACTGA
1861
CCGCTGATGC ACTTTTG TAT CGCAGACTTT GTAGTACAAT TTCCATGGAA TGGGGTCTTT
1921
GATAGTTATG TGGGCCTAAC GCACGTGCCC ACATCGTCTT CCCTGTCCTT GAATCACCTT
1921
CAGCGAGAAG AACGAAGCCG AGAAATAGTG GAGATCTATG TTGCATCTGA TCGGAAAAGT
1981
CGACTATGAG ACTCAATGGT CTGTCTGGCC GCAGCGGAAC CACTCCCAAA
ATAATCATCC
1981
CCACGACGCC
2041
GCCCACTCCT
GGGAACGGCC GTGAAAGAGG AGAGGGGAAA CCGGCAACCC
2041
CGACCCGGTG GCGTTGATCG GTACTTGTTG TCTGTATTCT ATGACGCAGT GATCGATCTT
2101 2161
ATGGTTCCGG AGCCTTTGCG AATATTCTTT CGAGATTGGA GCGGATGTTA TGATTCTGAA GGACATAATC T T T T G G C T G T TCTTCCTTCA AAATGTTTAA GGCAGATTGA ACATCTCCTG
2101
CATGCAGCTA
2161
TATCTCAGTT AGGTCATGGG AAAGTTGATA TTCGTCACGG TGTGACTCGA TGTAGTTGAA
2221
CATTCAACGC
2221
GGCGTTCGGA GGATTTACTA ACTGAGAATC
2281 2341
CGTCGATCTG GAATTCTCCC CATTCAGCTG TATCTCCGTC CTTGTCGATG TAGGACTTGA CGTCGGAGCT GGATTTAGCT CCCTGTATGT TTGGATGGAA ATGTGCTGAC CTGGTTGGGG
2281
GGAACCGACT
2341
ATGTAAATAG AAAGGGCTCT T T T C T T C T T T TGAGAAAGTC AGATATCTCT GACGTATAAC
2401
AGACCAGATC
2401
TGAGGAGATG GAGGAGGAGT AACTGGTGAA GAGTCGAGTT GTTTGAGAAA GAAAAGAGAG
2461
CATGGAGATG AGGCTCCCCA
TTCTCGTGAA GCTCTCTACA GATCTTGATG AACTTCTTGT
2461
TTGAGGAAGA ATTTGAGAGA GAACTGGAAA TGAAGGAGTT GGTATATGAA CCCAGATCTT
2521
TCACTGGGGT T T C T A G G T T T TGTAATTGGG AAAGTGCCTC ATCTTTAGTA AGAGAGCACT
2521
CTGGTTGATG GTATTAAATT GGAAAGTGTT CTTCTACTTC TGAGAGAATC T A T T T G T T A A
2581
GGGGATATGT GAGGAAATAA
2581
A
GCATCTCCTC
TCAAGAGGCT GGCAAGTACG AGAATCACAC
CTTGGCATAT GAATCATTAG CAGTCTGTTG GCCTCCCCTG
GCAGATCTGC
GAAGAATCTG TTATTCGTGC A T T G G T A T T T GCCTTCGAAC TGTATGAGCA
T T T T T G G C C T GTACTCTAAA T T T C T T T G G
TTATTTAAAG TTGATCATCT TATTTGTACA AGCAAAACAT
GCAGATTCTC CGGGGTCTAG AACGTGGTCC CCAGCCTGTT TAGGTGTCTG GCCTAATCTC AACTTCGGCC TCGTAGTCCA ATGCCTGGAC
analysis
The nucleotide sequence was determined by the chain termination technique of Sanger et al. (1977) by using [35S]thio-dATP as a radioactive label and the Sequenase kit (USBC) according to the manufacturer’s recommendation. The sequence evaluation was carried out by using a number of different computer programs (Corpet, 1988; Fristensky et a/., 1982; Larson and Messing, 1983; MacMolly, SoftGene, Berlin). RESULTS The sequencing strategy is documented in Fig. 1. The sequence was completely determined in both ori-
ATGGGAAAAG
TGTAAAGACT CGTTGTCCGT CATTCTCTTG TCGTGGATCT CCACAATCAC CGACTGAGTC TAGCTGTCAA CTGAGCCGCC GTGGACGGAA ATTGCAGTAT CATTTGGAGA AGATCGGCCG CAGCGGAACT
ACTGAAGTTG AACAGTTAAG AAGATGAACA
FIG. 2. Nucleotide sequences of AbMV, DNAs: (a) DNA A; (b) DNA B. The DNA composition is 24% A, 23% and 25% A, 22% C. 25% G, and 29% T for DNA B. The accession numbers for the EMBL data base (Heidelberg,
Sequence
ACAATTATTT
ATTACTGTTG ATCAAGAAGA
C, 249/o G, and 29% T for DNA A FRG) are Xl 5983 and Xl 5984.
entations. The resulting sequence is given in Figs. 2a and 2b for DNAA (2629 b) and DNA B (2581 b), respectively. The sequences are unique except for a small area of 180 b (99% homology): the so-called “common region” (CR) (Fig. 3). The DNA molecules are circular (Abouzid et al., 1986) and nucleotide 1 is arbitrarily set to the first nucleotide of the CR following the convention for geminiviruses with bipartite genomes (Lazarowitz 1987). The CR contains an inverted repeat which is highly conserved among all geminiviruses and which is a candidate structure for the origin of replication (Fig. 3) (Stanley and Davies, 1985). To analyse the coding capacity of the genome the sequences were searched for open reading frames
ABUTILON
MOSAIC
VIRUS
NUCLEOTIDE
1
SEQUENCE
463
60
AbMVA
CGGTGGCATTTATGTAATAAGAAGGGGTACTCTGGATGAGTTACTCCACTTGAGGCTCCTCAAAACTTGCTCATGlAAllGGAGTATTGG ***t*** *** ********************~*~~~*~~~*~~~~~~~~~~~~*~~~*~~~*~~~~~~~.~~~~*~~~*~~~~~~~~~~
AbMVB
CGGTGGCGTTTTTGTAATAAGAAGGGGTACTCTGGATGAGTTACTCCACTTGAGGCTCCTCAAAACTTGCTCATGTAATTGGAGTATTGG
AbMVA
AGGTCTTTATATACTAGAACTCTCATTAACGGATTTGCAACACGTGGCGGCCATCCGCTATAATATTACCGGATGGCCGCGCGACCCCCC ******************************************************************************************
AbMVB
AGGTCTTTATATACTAGAACTCTCATTAACGGATTTGCAACACGTGGCGGCCATCCGCTATAATATTACCGGATGGCCGCGCGACCCCCC
. . . . . ...-->
120
<..........
180
AT A
A
T
T
A
T T
A c
c G.C
-
160
-
165
-
170
C.G 145s
C.G T.A A.T C.G C.G
140.
G.C G.C C.G G.C
GCAACACGTG
GCGACCCCCC
FIG. 3. (Top) Comparrson of the homologous regton of AbMV, DNA A and DNA B. Arrows rndrcate Inverted repeats, and rdentrcal nucleottdes are aligned. The numbers on top of the sequences refer to the nucleotrde numbers of DNA A as well as DNA 8. (Bottom) Representahon of a potential hairprn loop In the common region. Its calculated free energy value is: dG (25”) = -28 kcal.
(ORF) for proteins larger than 10 kDa assuming that the first ATG is the start point (Fig. 4). In both DNAs the viral as well as the complementary sequences contain ORFs. Potential eukaryotic promoter structures are indicated in the maps (Fig. 4, triangles). Polyadenylation sites are present opposite the common region at the junction of two ORFs from different directions (Fig. 4, P). While in DNA B only one frame is used, in DNA A three frames in the complementary sense contain overlapping ORFs. To facilitate the comparison with other geminiviruses we follow the international accepted nomenclature for viral (V) and complementary (C) sense ORFs in DNA A and DNA B (Fig. 4). The overall genomic organisation is equivalent to those of other geminiviruses with a bipartite genome (Lazarowitz, 1987). So far the sequence contains sufficient regulatory elements to control transcription in the nucleus. Because we found viral DNA in the plastids (Grdning et al., 1987, 1990) we further addressed the question of whether the viral genome possesses the structural capability of expression in a prokaryotic-like genetic system. As an indicator we looked for ribosome-binding sites for 70 S ribosomes (Shine Dalgarno (SD) sequences) in the correct position relative to an open
reading frame, assuming that ATG as well as GTG can serve as start codon in such a system. We found one area which is a good candidate for prokaryotic expression (Fig. 5). An unprecedented, to our knowledge, duplication of the SD sequence is present close to an ATG as well as a GTG. The following ORFs (pro 1 and pro 2) are located in tandem using the same frame as the AC1 (Fig. 4). Looking for potential promoter structures in front of these ORFs we detected a consensus sequence to fscherichia co/i promoters with a considerable score of similarity using the algorithm of Staden (1984) (Fig. 5: -35; -10; $1, underlined). As AbMV is the only geminivirus to date whose DNA has been detected in the plastids, we compared the sequences of several group members in the putative prokaryotic region. None of the geminiviruses with closely related AC1 ORFs (BCTV, BGMV, CLV, and TGMV) possesses all the prokaryotic characteristics presented here. The closest related, in this respect, BCTV, is homologous in the consensus sequence of the f, co/i promoter and bears a core SD sequence (AGGA) 18 b in front of the ATG of pro 1 but the SD sequence as well as the start codon of pro 2 is deleted. In CLV a SD sequence (AGGAGA) is present 13 b In
464
FRISCHMUTH,
ZIMMAT,
AND
JESKE
ORF mol.ut. nucleotide =====I==========================
I 1, \i \\
P
AbMV-DNA
Vl
28 000
Cl
40
388
c2
14 400
1644
_ 1258
c3
15 900
1508
. 1113
200
- 1110
9 - 1574
(1110-1150): VI ->*** CCCTCATGAATT~ATTTGAATTTTATTGAATGATTTTCCAGTACAT GCGAGTACTTAATTATTTTAAACTTAAAJTAJZTTACTAAAAGGTCATGTA ***<-c3
ml.ut. ORF nucleotide =1===11==1=1==1===11===E========= Vl
29 800
548
- 1318
Cl
32 900
2253
. 1377
P (1320-1380): VI->*** AGTGAATAAATGAATTATTTAAAGTTGATCATCTTATTTGTACAAGCAAAACATACAATTATTT TCACTTATTTACTTAATAAATTTCAACTAGTAGAATAAACATGTTCGTTTTGTATGTT~A ***-z-c1
FIG. 4. The genomic organisation of AbMV, DNA A and DNA B showing the orientation of open reading frames with a coding capacity for proteins with a molecular weight higher than 10,000. (Solid arrows) ORFs with eukaryotic transcription signals; (v) TATA box; (P) polyadenylation signal; (dashed arrows) ORFs with prokaryotic features; (CR) common region (see Fig. 3). The tables summarize nucleotide start and stop positrons as well as coding capacity of the ORFs in viral (V) and complementary sense (C) DNA. Sequences at position P below the maps demonstrate potential polyadenylation signals (underlined) in relation to stop codons (***).
front of a GTG, comparable to the start codon of pro 2, but the promoter consensus sequence as well as the SD sequence of pro 1 is absent. The sequences of BGMV and TGMV do not show any striking similarities to prokatyotic transcription or translation elements. When the deduced protein sequences of various Cl ORFs were aligned, a relatively high amount of diversity was found around the beginning of the pro 1 ORF, although the rest of the sequences are highly conserved (Fig. 6). It is noteworthy that in the monopartite geminivi-
ruses with monocotyledon hosts an homologous protein sequence to Cl is split into two ORFs in different frames at exactly this locus (Donson et al., 1987; MacDowell et a/., 1985; Mullineaux er al., 1984). The start codon of the second ORF of WDV is GTG, which is functional in prokaryotes and plastids. Some evidence has been presented that these ORFs are joined after transcription by splicing (Accotto eta/., 1989; Schalk et al., 1989). To test whether any homology exists between AbMV DNA and the chloroplast genome its sequence was
ABUTILON
MOSAIC
VIRUS
NUCLEOTIDE
SEQUENCE
465
. ..GGAGATGTTCMTCTGCCTTAMCA~l~GCMGMUCCC~~~G~CCTTCGGCT -35
(8.
= -31.7)
-10
(8.
= -23.5)
+1 (S.
= -14.3)
CCGGMCCATCGGTTGCCGGTTTCCCCTCTCCTCTTTCACGGCCGTTCCCG~TGC~TGGGCG~T~TTATTTTGGGAGTGGTTCCGCTGCGGCCACACAW METIleILeLeuGlyVslVelProLeuArgProAspArg
AGATAGCAAATCCCC Arg***
FIG. 5. Putative open reading frames with prokaryotic features in DNA A: Shine Dalgarno sequences (double UnderlIne)and an E. co/; promoter consensus sequence (single underllne). The score numbers (s.) refer to the Staden (1984) algorithm for slmllarity for a given sequence with the consensus Of E. CO/i promoters. For location of the sequence, see Fig. 4, pro 1 and pro 2; DNA sequence 1s the reverse complement of nucleotides 2226 to 1429; (***) stop codons.
compared with that of tobacco chloroplast DNA (Shinozaki et a/., 1986). No extensive homology was found in computer searches but one small stretch of similarity was detected (Fig. 7). Although it is difficult to assess the significance of such a small sequence in the 150 kb of chloroplast DNA, we document it here because it is the most conserved sequence in all geminiviruses and is located in the viral hairpin loop, which is thought to be the origin of replication (Fig. 3). Moreover, part of this sequence is the conserved motif at the origin of replication of adenoviruses (Graham et al., 1989). The corresponding plastidal sequence is located in the intron of tRNArYS gene which contains an unidentified open reading frame (Sugita et a/., 1985). DISCUSSION Abutilon mosaic virus is one of the classical examples of early virology (Baur, 1906). By sequencing we have confirmed that its original West Indian isolate AbMV, (Hertzsch, 1928; Regel, 1875) is a member of the geminivirus group. It is a close relative of bean golden mosaic virus and tomato golden mosaic virus, both
from Latin America, as deduced from sequence comparisons (Howarth and Vandemark, 1989). The infectivity of the clones analysed here was proven by the technique of agroinfection (Grimsley et a/., 1986) on Nicotiana benthamiana, Nicotiana clevelandii, Nicoriana tabacum var. samsun, Nicotiana tabacum var. xanthi, and Malva parviflora (data not shown). We conclude therefore that the sequence presented here contains the essential genetic structures for replication and spread throughout the plant. Geminivirus particles are localized in the nucleus (Abouzid et a/., 1988a; Harrison, 1985). Their DNA is transcribed to polyadenylated RNA (Accotto et a/., 1989; Kallender et a/., 1988; Schalk et a/., 1989; Townsend et a/., 1985) and most promoter structures are typical for nuclear transcription. Comparable transcription was found for AbMV in S 1 nuclease protection and primer extension experiments (Frischmuth, S., unpublished data). We found viral ssDNA (Groning eta/., 1987) and replicative dsDNA (Grijning et al., 1990) in the plastids. Both genomic DNAs are present in these organelles
466
FRISCHMUTH,
ZIMMAT,
AND
JESKE
AB
MPPP-KKFRVOAKNYFLTYPOCSLTKDEALSOLONLETPVNKKF~KICRELHENGEPHLHVLlOFEGKYOCTNNRFFDLVSPTRSAHFHPNIDGAKSSSDVKSY~DKDGDT
BG
MPPP-ORFRVOSKNYFLTYPRCTlPKEEALSOLOKIHTTlNKKFlKVCEERHDNGEPHLHAL~OFEGKFICTNKRLFDLVSTTRSAHFHPN~OGAKSSSDVKEYIDKDGVT
TG
MPS,HLKRFOlNAKNYFLTYPOCSLSKEESLSOLOALNTPINKKF~K~CRELHEDGOPHLHVLIOFEGKYCCONORFFDLVSPTRSAHFHPNlORAKSSSDVKTYIDKDGDT
BC
MPPT-KRFRIOAKNIFLTYPOCSLSKEEALEO~OR~OLSSNKKY~KlARELHEDGOPHLHVLLOLEGKVOITNlRLFDLVSPTRSAHFHPNIORAKSSSDVKSYVDKDGDT
CL
MRTP--RFRIOAKNVFLTYPKCSlPKEHLLSFlOTLSLOSNPKFlKICRELHONGEPHLHAL~OFEGK~TITNNRLFDCVHPSCSTSFHPN~OGAKSSSDVKSYLDKDGDT
AB
AEUGEFOIDGRSARGGOOTANDSYAKALNAGDVOSALNILLER~FAKAPEPUVAGFP-SPLSRPFPRRCRSGR~lILG-WPLRPDRPLSLIV
BG
lEUGOFOVDGRSARGGOOSANDSYAKALNADSlESALTlLKEEOPKDYVLONHN~RSNLERlFFKVPEPWVPPFPLSSFVNlP~~OD~-DDYFGRGSAARPERPlSlIV
TG
LVWGEFOVDGRSARGGCOTSNDAAAEALNASSKEEALO~IREKIPEKYLFOFHNLNSNLDRIFDKTPEPULPPFHVSSFTN~PDE~ROU-AENYFGKS~MRPERP~SIII
BC
lEUGEFOlDGRSARGGOOTANDSYAKALNATSLDOALOlLKEEOPKDYFLOHHNLLNNAOKIFORPPDPUTPLFPLSSFTNVPEE~OEU-ADAYFGVDAAARPLRYNSIIV
CL
VEUGOFOIDGRSARGGOOSANDAYAKALNSGSKSEALNVIRELVPKDFVLOFHNLNSNLDRlFOEPPAPYVSPFPCSSFDOVPVEIEEWVADNV--RDSAARPURPNSIVI
III1
III1
II
llllllll
II
II
I
llllllll
I lllllll
I
III
I
lllll
llllll
I
llllll III
IIIIIIIIIIIIIIIIIIIIIII
I
llllllll
lllll
II
II
I
II
I I I lllllll I I I III
I II
lllllll
I IIII
lllllll
IlIIIIIIIIIIll
I III
IIII
III
IIIIIIIIIIIIIIIIIIII
I I I lllll
I III I III1
I- lllll
IIIIIlIIIIl
lllllllI
I I IIIIIIIIIlIIIIIIIiIIIIIIII
I I III
III
III1
I I
I
llllll
I
llllll
I
I IIIIII llllll
lllllllllI
llllll
I ====>
Ill
II
llllllllll
IIIIIIIIIIIIII IIIII III
II IIIIIIII
III
IIIIIIIIIII
II
IIIIIIIIIIIII
II III
IIIIIIIIIIIIIIIIIIIIIIIII
IllllIIllI I IIII llllll
I I
IIII
I
I
I
II
I
I
I
I lllll III
I
I
IIIIIIIIIIIIII
I III
III
II
I.
lllll
lllll
ii
I
I
i II
I I
llllll
I
iiiiili
lllll
ii III
II
I
III
IIIIIIIlIIII
I I
iii
ii11
III1
II I II
I
I III1
III
I Ill
I
I I I
MS
IjDGFCIOSSDERSRKOSLYl
UD
BGRLFO-ESPGRHK-SIYI
II
AB
EGDSRTGKTMUARALGPHNYLSGHLDFNGRVYSNEVEYNVlDDVAPHYLKLKHUKELLGAOKD~SNCKLAKP--VOIKGGIRAIVLCNPGEGSSYKEYLDKEE-NRG...
BG
EGDSRTGKTMUARALGPHNYLSGHLDFNSL~SNSVEYNVlDDlTPNYLKLKDUKELIGEOKD~SNCKYGKP--VOIKGGIPSIVLCNPGEGSSYKDFLNKEEK-PA...
TG
EGDSRTGKTMUARSLGPHNYLSGHLDLNSRVYSNKVEYNVlDDVTPOYLKLKHUKEL~GAORD~TNCKYGKP--VOlKGGIPSIVLCNPGEGASYKVFLDKEE-NTP...
BC
EGDSRTGKTMUARSLGAHNYlTGHLDFSPRTYYDEVEYNVlDD~PTYLKMKHUKHLIGAOKE~TNLKYGKPR-V-lKGGIPClILCNPGPESSYOOFLEKPE-NEA...
CL
EGDSRTGKTIUARSLGPHNYLCGHLDLSPKVFNNMUYNVlDD~PHYLK~-HFKEFMGSORDVPSNTKYGKP--VOIKGGIPTIFLCNPGPTSSYKEFLA-EEKOEA...
MS
VGPTRTGKSTUARSLGVHNYWPNNVDWSS--YNEDAIYNIVDD--lPFKFCPCUKOLVGCORDFIVNPKYGKKKKVOKKSK-PTIILANSD-EDWMKE--MTPGOLEY...
UD
CGPTRTGKTSUARSLGTHNYYNSLVDFTT--YDVNAKYNI~DD~~IPFKFTPNUKCFVGAORDFTVNPKYGK-RKV-~RGGIPCl~LVNPD~EDULKD--MTPEOSDY...
IIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIII
lIIIIIlIIlII
IIIIIIIIIIIIIIII lllllllll I lllllll
III llllll
III1
III1
III
II IIII
III1
III1
II
II
lllllllll
II I I
III I III
III1
lllll I III
llllllllI
II III
I lllll
llllllll
I I
lllllllll llllll
lllllllI
III
llllll IIII
I III lllll
I I
II II
I llllllllI
IIII I I
I lllll
I II
Ill1 II
II
II lllllll lllll I lllll
lllllll
I llllll
IIII
lllllll
I lllll
I llllll III II
I I II
IIIIIIIIIIIII
IIIIIIIIIIIIIIIIII
Illll
I
I llllll lllll IllIIIllII
I III III
II
III
II
II
I I I
III
II
I
II
II I
III1
I
FIG. 6. Comparison of the Cl ORFs of geminiviruses with dicot hosts (AB; BC; BG; CL; TG) and the C2 of those with monocot hosts (MS; WD). The arrow above ,413 indicates the beginning of the prokaryotic-like ORF pro 1 of AbMV. Possible internal start codons near this position in related geminiviruses are underlined. (AB; AbMV,; BC; beet curly top virus; BG; bean golden mosaic virus; TG, tomato golden mosaic virus; CL; cassava latent virus; MS; maize streak virus; WD; wheat dwarf virus). Sequences of the compared viruses were translated from nucleotide sequences of Hamilton eta/. (1984) Howarth eta/. (1985) MacDowell eta/. (1985) Mullineaux eta/. (1984) Stanley and Gay (1983) and Stanley eta/. (1986).
(Grbning et a/., 1990). Here we document putative genomic structures which fit with a prokaryotic-like genome organisation. Experiments are in progress to determine under which conditions the prokaryotic ORFs are expressed. We assume that AbM.V has the capability to enter the nucleus as well as plastids. Although there is a long controversy over whether other plant viruses can enter plastids (Reinero and Beachy, 1986; Rochon and Siegel, 1984; Shalla et al., 1975; Siegel, 1971) evidence was recently presented that TMV is able to do so (Schoelz and Zaitlin, 1989). TMV RNA was translated in vitro in E. co/i and in plastid lysates (Camerino et al.,
1982; Glover and Wilson, 1982). For a geminivirus it was shown that the promoter of the coat protein is active in E. co/i (Petty et al., 1986). Geminate particles are mostly localized in the bundle area (Abouzid et a/., 1988a; Harrison, 1985) whereas virus-like structures in,AbMV-infected Malva pardora plants were found in the plastids of palisade and spongy parenchyma (Jeske and Werz, 1980). In studies on isolated plastids (Groning et a/., 1987, .1990) and isolated nuclei (Abouzid, 1988) the organelles from different tissues of the leaf were pooled and their origins could not be discriminated. In situ hybridization experiments (Horns, T., unpublished data) showed that in (TGMV)
ABUTILON PT
CTCTTTTTTTTGG--MGATCCCCTATAATAATGGATTTCTGC
AB
III IIIIIIIIIIIII I I III CM-CA--CGTGGCGGCCATCCGCTATAATATT--ACCGGATGGCCGC
TG
GGG-CA--CGTGGCGGCCATCCGTT-TAATATT--ACCGGATGGCCGC
BC
CM-CTTTCATAAGGGCCATCCGTTATAATATT--ACCGGATGGCC-C
CL
GAA-CACCCMGG-GGCCAACCG-TATMTATT--ACCGGTTGGCCCC
BG
CATACA--CGTGGCGGCCATCCGATATMTATT--ACCGGATGGCCGC
II IIIIIIIIIIIIIII I III I II AD
I I II
I lllllll
IIIIllIIIII lllll
I II lllll
VIRUS
I II
IIIIIIIIIIIII
lllllll
IIIIIIIIIII
III lllllllll
lllll
III llllllllI llllll
MOSAIC
lllll
I
lllll
I
lllll
I
I III
ATMTA-I--ACC
FIG. 7. Comparison of the conserved hairpin loop structure of vartous gemIniviruses with a part of chloroplast DNA from tobacco (nucleotlde 2376 -242 1; Shlnozakt et al., 1986) and the consensus sequence for the orlgln of replicatlve of adenoviruses (Graham er al., 1989). GemInIvirus sequences were taken from the literature cited in Fig. 6. AD; adenovlrus; AB. BC, BG, CL, and TG are as tn Fig. 6.
infected Abutilon sellovianum plants viral DNA is limited to nuclei and plastids of the phloem. ACKNOWLEDGMENTS We thank Prof. Dr. W. 0. Abel and Dr. D. Evans for helpful dlscussions, Prof. Dr. B. Wittlg (SoftGene, Berlln)for providing the MacMolly computer program, and R Schmidt for help In preparation and typing of the English text. This work was supported by a grant from the Deutsche Forschungsgemelnschaft (Je 1 16/6) and the Bundesminlster fiir Forschung und Technologle (BCT 507).
REFERENCES AEOUZID, A. M (1988). “lsolierung, Charakterislerung und Eigenschaften des AbutlIon Mosaik Virus,” Dissertation Fachbereich Biologie. Hamburg. ABOUZID, A M , BARTH, A., and JESKE, H. (1988a). lmmunogold labeling of the abutilon mosaic virus in ultrathln sections of epoxy resin embedded leaf tissue. /. Ulfrastruct. Res. 99,39-47. ABOUZID. A. M., FRISCHMUIH, T., and JESKE. H. (198813). A putative replicative form of the abutilon mosaic virus (gemini group) in a chromatln-like structure. MGG, MO/. Gen. Gener. 212, 252-258. ABOUZID, A. M and JESKE, H. (1986). The purification and characterization of gemin particles from abutllon mosaic virus infected Malvaceae. 1. Phytoparhol. 115, 344-353. Accorro, G., DONSON, J., and MULLINEAUX, P. (1989). Mapplng of Digitana streak virus transcripts reveals different RNA species from the same transcrlptlon unit. EMBO J. 8, 1033-l 039. BAUR. E. (1906). Gber die lnfektiijse Chlorose der Malvaceen. Kg/. Preuss. Akad Wiss. 1, 1 l-29 CAMERINO, G., SAY, A., and CIFERRI, 0. (1982). A chloroplast system capable of translating heterologous mRNAs. FfBS Lerr. 150, 9498.
CORPET, F. (1988). MultIpIe sequence alignment with hierarchical clustering. lvucleic Aods Res. 16, 10,881-l 0,890. DONSON, J.. Accorro, G., BOULTON, M., MULLINEAUX. P., and DAVIES, J. (1987). The nucleotlde sequence of a geminivirus from Digitaria sangumalis. Virology 161, 160 169.
NUCLEOTIDE
SEQUENCE
467
FRISTENSKY, B., LIS, J., and Wu, R. (1982). Portable microcomputer software for nucleotlde sequence analysis. Nucleic Acids Res 10, 6451-6463. GLOVER. J., and WILSON, T. (1982). Efficient translation of the coat protein clstron of tobacco mosaic virus In a cell-free system from Eschenchia coil. Eur. I &o&em. 122, 485 492. GRAHAM, F.. RUDY, J., and BRINKLEY, P (1989). lnfectlousclrcular DNA of human adenovlrus type 5: Regeneration of viral DNA termlnl from molecules lacking terminal sequences. fMB0 /. 8, 2077 2085. GRIMSLEY, N., HOHN, B., HOHN, T , and WALDEN, H (1986) “AgroInfection”, an alternatlve route for viral Infection of plants by using Ti plasmid. Proc. Nat/. Acad. SC;. USA 83, 3282-3286. GRBNING, B. R., ABOUZID, A. M., and JESKE, H. (1987). Single-stranded DNA from abutilon mosaic virus (AbMV) IS present In the plastids of Infected Abutllon sellovlanum Proc Nar/ Acad SC;. USA 84, 8996-9000. GRSNING, B. R., FRISCHMUTH, T., and JESKE, H. (1990). Replicatlve form DNA of abutilon mosaic virus IS present in plastlds. Mol. Gen. Gener. 220, 485. 488. HAMILTON, W. D., STEIN, V. E.. Couns. R H A, and BUCK, K W. (1984). Complete nucleotlde sequence of the Infectious cloned DNA components of tomato golden mosaic virus. Potential coding regions and regulatory sequences. fMBO/. 3, 2 197. 2205. HARRISON, B. D (1985). Advances in gemInIvirus research Amu Rev. Phyroparhoi. 23, 55.--82. HERTZSCH, W. (1928). Beltrgge zur tnfektlijsen Chlorose. Z. Bor. 20, 65-80. HOWARTH. A., CATON, J.. BOSSERT. M.. and GOODMAN, R. (1985). Nucleotlde sequence of bean golden mosaic vtrus and a model for gene regulation. Proc. Nat/. Acad. So. USA 82, 3572 -3576. HOWARTH, A. J , and VANDEMARK, G. J (1989). Phylogeny of geminIvIm ruses. 1. Gen. I/iroi. 70, 27 17 2727 JESKE, H., and WERZ, G. (1980). Cytochemlcal characterization of plastldal inclusions In Am/ion mosaic-Infected Malva parviflora mesophyll cells. I/iroiogy 106, 155 158 KALILENDER, H.. Ptrrv, I. T. D STEIN, V. E PANICO. M BLENCH, I P., ETIENNE, A. T., MORRIS, H. R., Courrs, R H A., and BUCK, K. W. (1988). Identification of the coat protein gene of tomato golden mosaic virus. /. Gen. I.&o/. 69, 135 1 1357. LARSON, R., and MESSING, J (1983). Apple /I software for Ml 3 shotgun DNA sequencing. Nucle/c Aods Res. 10, 39 -49. LAZAROWITZ, S. G (1987). The molecular characterization of gemlrllv ruses. Plant Mol. B/o/. Rep. 4, 1 77 192 MACDOWELL, S., MACDONALD, H HAMILTON, W., Courts, R., and BUCK, K. (1985). The nucleotide sequence of cloned wheat dwarf virus DNA. EMBO/ 4, 2173&2180. MULLINFAUX, M., DONSON, J.. MORRIS-KRSINICH, B., BOULTON, M.. and DAVIES, J. (1984). The nucleotlde sequence of maize streak virus DNA fMBO/ 3,3063-3068. PET‘Y, I., COUTTS, R., and BUCK, K (1986). GeminIvIrus coat protein gene promoter sequences can function In Eschenchia ~011. Nucleic Acids Res. 14, 5 1 13 REGEL, E. (1875) “Gartenflora.” Erlangen. REIN~RO, A and BEACHY, R (1986). Association ot TMV coat protein with chloroplast membranes In virus-infected ieaves. Plant MO/. B/o/. 6, 291 301. ROCHON, D and SIEGEL, A. (1984). Chloroplast DNA transcripts are encapsidated by tobacco mosatc virus coat protein. Proc. Nat/. Acad.%. USA81, 1719-1723 SANGFR. F., NICKLEN, S , and COUL.SON. A R (1977) DNAsequencing with chatn-terminattng Inhlbltors. Pm; Nat/ Acad. SC;. USA 74, 5463 5467
468
FRISCHMUTH,
H.-J., MATZEIT, V., SCHILLER, B., SCHELL, J., and GRONENBORN, B. (1989). Wheat dwarf virus, a geminivirus of gramlnaceous plants needs splicing for replication. fMBO/. 8, 359-364. SCHOELZ, J., and ZAITLIN, M. (1989). Tobacco mosaic virus RNA enters chloroplasts in vivo. Proc. Nat/. Acad. SC;. USA 86, 44964500. SHALLA, T., PETERSEN. L., and GUINCHEDI, L. (1975). Partial characterization of virus-like particles in chloroplasts of plants infected with the U5 strain of TMV. Virology66, 94-l 05. SHINOZAKI, K., OHME, M., TANAKA, M., WAKASUGI, T., HAYASHIDA, N., MATSUBAYASHI, T., &ITA. N., CHUNWONGSE, J., OBOKATA, J., YAMAGUCHCSHINOZAKI, K., OHTO, C., TORAZAWA, K.. MENG, B., SUGITA, M., DENO, H., KAMOGASHIRA, T., YAMADA, K., KUSUDA, J., TAKAIWA, F., KATO, A., TOHDOH, N., SHIMADA. H., and SUGIURA, M. (1986). The complete nucleotide sequence of the tobacco chloroplast genome: Its gene organization and expression. EA&?O /. 5, 20432049. SIEGEL, A. (1971). Pseudovirions of tobacco mosaic virus. Virology 46,50-59.
SCHALK,
ZIMMAT,
AND
JESKE
STADEN, R. (1984). Measurement of the effects encoding for has on a DNA sequence and their use for finding genes. Acids Res. 12, 551-567. STANLEY, J., and DAVIES, J. (1985). Structure and function of genome of geminiviruses. In (J. Davies, Ed.), “Molecular rology” Vol II, pp. 191-2 18. CRC Press, Boca Raton, FL.
a protein Nucleic the DNA Plant Vi-
STANLEY. J., and GAY, M. (1983). Nucleotide sequence of cassava latent virus DNA. Nature (London) 301, 260-262. STANLEY, J., MARKHAM, P., CALLIS, R., and PINNER, M. (1986). The nucleotide sequence of an infectious clone of the gemini virus beet curly top virus. EI1/1BO/. 5, 1761-l 767. SUGITA, M., SHINOZAKI, K., and SUGIURA, M. (1985). Tobacco chloroplast tRNA lys (UUU) gene contains a 2.5.kilobase-pair intron: An open reading frame and a conserved boundary sequence in the intron. Proc. Nat/. Acad. SC;. USA 82, 3557-3561. TOWNSEND, R., STANLEY, J., CURSON, S. J., and SHORT, M. N. (1985). Major polyadenylated transcripts of cassava latent virus and location of the gene encoding coat protein. E/!&30/. 4,33-37.