The nucleotide sequence of abutilon mosaic virus reveals prokaryotic as well as eukaryotic features

The nucleotide sequence of abutilon mosaic virus reveals prokaryotic as well as eukaryotic features

VIROLOGY 178,46 1 468 (1990) The Nucleotide Sequence of Abutilon Mosaic Virus Reveals Prokatyotic as Well as Eukaryotic Features T. FRISCHMUTH,’ l...

751KB Sizes 0 Downloads 40 Views

VIROLOGY

178,46

1 468

(1990)

The Nucleotide Sequence of Abutilon Mosaic Virus Reveals Prokatyotic as Well as Eukaryotic Features T. FRISCHMUTH,’ lnstitut

fijr Allgemeine

Botarxk, D-2000 Received

G. ZIMMAT,

Angewandte Molekularbiologte Hamburg 52, Federal Republic October

H. JESKE’

AND

24, 1989; accepted

der Pfianzen. of Germany May

Ohnhorststrasse

18,

23, 1990

The complete nucleotide sequence of abutilon mosaic virus (West Indian isolate, AbMV.1 is presented. The resulting genomic structure resembles that of other geminiviruses which are transmitted by the whitefly Bemisia tabaci: AbMV possesses a bipartite circular genome with bidirectional orientation of the open reading frames (ORF). Both components have a common region of 180 bases with 99% homology while the rest of their sequence is distinct. Eukaryotic regulatory transcription elements precede most ORFs and polyadenylation signals are present at the end of most ORFs. However, two ORFs show features of prokaryotic genes. This chimaeric genome organisation is discussed with reference to the finding that AbMV DNA is present in plastids as well as in the nucleus of infected cells. 6:) 1990Academic Press. Inc

INTRODUCTION

a/., 1988b). Full-length clones of DNA A (pAbMV, 100) and DNA B (pAbMV, 200) were obtained by insertion into the unique Pstl and Sac1 sites, respectively. To analyse the insertion sites subclones of viral DNA which extend over the fstl and Sac1 sites were isolated from infected plants. Subclones of the full-length clones were produced to reveal overlapping sequences with different restriction enzymes (Fig. 1). Cloning was performed under German safety guidelines L2/Bl according to the licence of the ZKBS 1526/l.

Abutilon mosaic virus (AbMV) is a member of the geminivirus group which is characterized by singlestranded circular DNA and a twin particle morphology (Harrison, 1985; Abouzid and Jeske, 1986). It possesses a bipartite genome as do other geminiviruses which are transmitted by the whitefly Bemisia tabaci (Abouzid el al., 1988b). AbMV particles were localized in the nuclei of bundle area cells of infected plants (Abouzid eta/., 1988a). A replicative intermediate was purified from a chromatin-like structure (Abouzid et al., 198813). However, viral DNA was found in the plastids (GrBning et a/., 1987, 1990). The localization of viral DNA in plastids as well as in nuclei should require an adaption to the completely different genomic organisation between a eukaryotic and prokaryotic-like environment. We asked whether this is reflected in structural elements of the viral genome. To answerthis question the complete genome of AbMV was sequenced and its composition analysed. MATERIALS

AND

DNA

WC----t--

METHODS

-c__ __)__c

innes

for reprints

---

-0 --

b

AbMV, DNA was isolated from Abutilon sellovianum var. marmorata REGEL and cloned into M 13mp8 and M 13mpl9 vectors as previously described (Abouzid et John

--

----

Plants, virus, and clones

’ Present address: NR4 /UH, U K. ’ To whom requests

A

Institute, should

Colney

Lane,

; ”

H

85

K

-

BH

HSa

1000

s

h

0.3 H ,obo

1

FIG. 1. Nucleotlde sequence strategy of the AbMV, DNA. Fulllength clones were subcloned (arrows) wrth the lndlcated restrictlon enzymes and sequenced with the chatn termlnatton technlquc accordtng to Sanger et a/. (1977): B, BarnHI. Bg, Bglll. E, EcoRI. H,

Norwich

/@all; Ha, Haelll; Sall, Sa, Sacl.

be addressed.

461

Hc,

Hincll;

Hd,

H/ndlll,

0042-6822/90

K, Kpnl.

P, Pvul;

Ps, Pstl;

$3.00

Copyrsght (c 1990 by Amdermc Press, Inr 411 rlyhfs of rerm,luctmn I” any form reserved

S,

462

FRISCHMUTH. 10

a

20

30

40

50

ZIMMAT. 60

I

AND

61 CAAAACTTGC

TCATGTAATT GGAGTATTGG AGGTCTTTAT ATACTAGAAC

TCTCATTAAC

10

b

I I I I I 1 CGCTGGCATT TATGTAATAA GAAGGGGTAC TCTGGATGAG TTACTCCACT TGAGGCTCCT

JESKE 20

30

40

50

I

60

I

I

I I I 1 CGGTGGCGTT TTTGTAATAA GMGGGGTAC TCTGGATGAG TTACTCCACT TGAGGCTCCT 61 CAAAACTTGC

TCATGTAATT GGAGTATTGG AGGTCTTTAT ATACTAGAAC

TCTCATTAAC

121

GGATTTGCAA CACGTGGCGG CCATCCGCTA

CGCTTTTTCA ACCTTTAATT TAGAATTAAA GGTAGTCCAT

181

CCCCTGGTGC TGCTCTCGCA CTCGCTCTAC CCTGGTGCTC TTCTCACACG CGCTCTCCCA

TGCGCCTGAC GAGTCAATAT AATTTGAACA ACTTGTAGCG

241

TTGGTGCGGG TCCTTCACGC TCCTCTTTTG GCTGGACCTT TAATTTGAAT TAAAGGTGTT

CTAAGTTGTT GGGTTGTCTA TAAATGAAAG CCATTGGCCC ACGAGCTTTA ACCCAAAATA

301

TACTTTCTCG TGCGACGTGC T T T A T A G T T T GAATTGTTGT CGCGCGAATA CTGGCTATGG

361

CCTAAGCGCG ATCTCCCATG GCGATCGATG CCTGGAACAT CAAAGACTAG

TCGCAACGCT

361

ACCATTGTAC

421

AATTATTCTC CTCGTGCTCG TATTGGGCCA AGAGTTGACA AGGCCTCTGA ATGGGTGCAC

421

TTAATGGTGG ACCATCTAAC

481

AGGCCCATGT ACAGGAAGCC CAGGATCTAC

481

CAGCAATCCC

541

GGCTGTGAAG GGCCTTGTAA GGTCCAGTCG TATGAACAGC GTCATGACAT CTCACATGTT

541

TCTGATAATG TACCCGTCTA

601

GGCAAGGTAA TGTGCATCTC TGATGTGACA CGTGGTAATG GCATTACCCA

601

TTCACGCAAT

CATGTGTGGA AACGGCCAAC

TGCTGCGAAG AGACATGACT GGAAGCGTCG

661

AAGCGTTTCT GTGTCAAGTC T G T G T A T A T T TTAGGGAAGA TTTGGATGGA CGAGAACATC

661

ACCTTCAAAT

ACGAGCAAGC

GCCCAAGATG TCAGCCCAAC

721

AAGCTCCAGA

721

GAATCAGTAT GGGCCTGAGT TTGTAATGGC CCAGAACTCA

GCGATCTCGT CGTTCATCAG

121

GGATTTGCAA CACGTGGCGG CCATCCGCTA

181

TGGTGCCCGT ACATCCCGCG

241

GGCGCCTCGT CCAATAATAA

301

ACCACACGAA

TAATATTACC GGATGGCCGC GCGACCCCCC

CGGACGCTGA GGACGGCCGA TGTGCCCAGA CCGTGTTGGT

CAGTGTCATG TTCTGGTTGG TCCGAGACCG TAGACCGTAT

TAATATTACC GGATGGCCGC GCGACCCCCC

CTGGTCAAGG ACGTGTCATA ATTGGACCTT GCTTCTGAGT CTATTTGCGA T A T A T A T T T G GCCGGATAGG T A C T T T T G T G GTAAAAGACT

CCACGTTTTT TCGCACCTGG CTAATCGTGT T G T T T T T T C T TATATTATCA GGAATAAACG TGGATCCTAC TTTAATCAAC GTCGACAGTA CCAACGACGA

GCATACATGA

781

GGCACGCCCA TGGATTTCGG TCATGTGTTC AACATGTTCG ACAACGAGCC

CAGTACTGCC

781

TTATCCTGAC TTGGGTAGGT CCGAGCCCAA

CCGAAGCAGG

TCCTATATTA GGTTGAAGCA

841

ACTGTGAAGA ACGATCTCCG

TGGCAAGGTC

841

ACTACGCTTC

AAAGGGACGG TGMGATTGA

ACAGGTACCA

TTTGCGATGA ACATGGACGG

901

ACAGGTGGAC AGTATGCCAG CAATGAACAG

GCAATCGTCA AGCGTTTCTG GAAGGTCAAC

901

ATCTACCCCC

AAAGTGGAAG GAGTGTTCTC CCTTGTCATT GTTGTGGATC GTAAACCACA

961

AATCATGTGG TCTACAATCA

CGATCGTTAC CAGGTCCTTC ACAAGTTCTA

GGAGAACGCT

961

CCTTGGCCCG TCTGGTTGTC TGCATACATT TGACGAGCTG TTTGGTGCTA GGATACACAG

1021

CTACTATTGT ATATGGCATG TACTCATGCT TCTAACCCCG

TTTATGCAAC TTTGAAGATC

1021

TCATGGTAAC CTCAGCGTAA CCCCCGCGTT GAAAGATCGA TATTACATTC GCCACGTTTG

1081

CGAATCTATT TCTATGATTC GCTCATGAAT TAATAAAATT T G A A T T T T A T TGAATGATTT

1081

CAAACGTGTG CTATCTGTGG AGAAGGATAC GCTGATGGTA GACGTGGAAG GATCCATTCC

1141

TCCAGTACAT AATTTACATA

C G T T C T G T T T GTCGCAAACT GAACAGCTCT AATTACATTG

1141

TCTCTCTAAC CGGCGTATTA ATTGTTGGGC CACGTTTAAG GACGTGGATC GTGAGTCATG

1201 1261

TTAATGGAAA TCACGCCTAA AAATAAATTG ACCCAGAAGC

CTGATCTAAG TACATGTTGA CTAAACGCCT AAATCTATTT TGTCATCGAA GTCGTCCAAA CTTGGAAGTT CAGGTAAGCT

1201

TAAGGGTGTT TATGATAACA TAAGTAAGAA CGCCCTGTTA CTTTATTACT GCTGGATGTC

1261

GGATACGCCT GCGAAGCATC CACTTTTGTA TCGTTTGACC TTGACTATAT TGGTTAAGTG

1321

TTGTGGAGAT GCAACGCTCT CCTCAGGTTG TGGTTGAACC GTATTTGTAC ATGGTATATC

1321

AATAAATGAA

1381 1441

CTCGTTCTGG TGTATAGCGG GTCCTCTACT TTGTATATCC TGAAATACAG GGGATTTGCT ATCTCCCAGA TATACACGCC ATTCTCCGCC TGATGTACAG TGATGAATTC CCCTGTGCGT

1381

CAATGATTTG GCTTGAGAAG CCTGACAGTT ACTATTGACA CATTCTTGGA CCGCTGTCCT

1441

GACTAATTCG TTCAACTGGC CCATTGACAT TGTGATGTTG GCGTCCGCTC TCTGGTCACC

1501 1561

GAATCCATGT CCTGTACAGT CTATGTGGAA GTAGATGGAG CACCCGCACT GCAGATCAAT CCTCCGCCGC CTAATGGCCC TCCTCTTGGC CTGCCTGTGT GCCTTCTTGA TAGAGGGGGG

1501

CACAATAGAA

1561

TATGGATGGA GCTCCATTTC TGAGTCCGTA TCTGGCTGGG CTCTCCCTAT GGTGCTCCTG

1621 1681

CTGTGAGGGT GATGAAGATC GCATTCTTGA GAGTCCAGTT GCGTACACCT CTATTTTCCT CTTTGTCCAG GTACTCTTTA TAGCTGGAAC CCTCACCAGG ATTGCAAAGC ACGATTGCTC

1621

GAAGCCCATG ACTTCACCAG

1681

ATGGACCGCG CATTCCTGAT GGGCTTCCTC TCCCATTTCC CATAATCCAC

1741

GGATTCCTCC T T T A A T T T G A ACCGGCTTGG CTAACTTGCA ATTTGACTGC CAGTCTTTTT

1741

TCCACGTCTT TTTCGGAGAA CTGTTTGGAC AGTATCCTTA CTGTTGGTGC CCGGAACGAT

1801

GGGCCCCCAG CAATTCTTTC CAGTGCTTTA GCTTTAGATA GTGCGGTGCG ACCTCATCAA

1801

GTCGACTGAG TGTTTCGCCG TGGACAATTT CAGTTTCCCT TTGAACTTGG CGAAGTGGGT

1861

TGACGTTATA CTCCACTTCG TTCGAATAGA CGCGACCATT GAAGTCCAGG TGTCCACTGA

1861

CCGCTGATGC ACTTTTG TAT CGCAGACTTT GTAGTACAAT TTCCATGGAA TGGGGTCTTT

1921

GATAGTTATG TGGGCCTAAC GCACGTGCCC ACATCGTCTT CCCTGTCCTT GAATCACCTT

1921

CAGCGAGAAG AACGAAGCCG AGAAATAGTG GAGATCTATG TTGCATCTGA TCGGAAAAGT

1981

CGACTATGAG ACTCAATGGT CTGTCTGGCC GCAGCGGAAC CACTCCCAAA

ATAATCATCC

1981

CCACGACGCC

2041

GCCCACTCCT

GGGAACGGCC GTGAAAGAGG AGAGGGGAAA CCGGCAACCC

2041

CGACCCGGTG GCGTTGATCG GTACTTGTTG TCTGTATTCT ATGACGCAGT GATCGATCTT

2101 2161

ATGGTTCCGG AGCCTTTGCG AATATTCTTT CGAGATTGGA GCGGATGTTA TGATTCTGAA GGACATAATC T T T T G G C T G T TCTTCCTTCA AAATGTTTAA GGCAGATTGA ACATCTCCTG

2101

CATGCAGCTA

2161

TATCTCAGTT AGGTCATGGG AAAGTTGATA TTCGTCACGG TGTGACTCGA TGTAGTTGAA

2221

CATTCAACGC

2221

GGCGTTCGGA GGATTTACTA ACTGAGAATC

2281 2341

CGTCGATCTG GAATTCTCCC CATTCAGCTG TATCTCCGTC CTTGTCGATG TAGGACTTGA CGTCGGAGCT GGATTTAGCT CCCTGTATGT TTGGATGGAA ATGTGCTGAC CTGGTTGGGG

2281

GGAACCGACT

2341

ATGTAAATAG AAAGGGCTCT T T T C T T C T T T TGAGAAAGTC AGATATCTCT GACGTATAAC

2401

AGACCAGATC

2401

TGAGGAGATG GAGGAGGAGT AACTGGTGAA GAGTCGAGTT GTTTGAGAAA GAAAAGAGAG

2461

CATGGAGATG AGGCTCCCCA

TTCTCGTGAA GCTCTCTACA GATCTTGATG AACTTCTTGT

2461

TTGAGGAAGA ATTTGAGAGA GAACTGGAAA TGAAGGAGTT GGTATATGAA CCCAGATCTT

2521

TCACTGGGGT T T C T A G G T T T TGTAATTGGG AAAGTGCCTC ATCTTTAGTA AGAGAGCACT

2521

CTGGTTGATG GTATTAAATT GGAAAGTGTT CTTCTACTTC TGAGAGAATC T A T T T G T T A A

2581

GGGGATATGT GAGGAAATAA

2581

A

GCATCTCCTC

TCAAGAGGCT GGCAAGTACG AGAATCACAC

CTTGGCATAT GAATCATTAG CAGTCTGTTG GCCTCCCCTG

GCAGATCTGC

GAAGAATCTG TTATTCGTGC A T T G G T A T T T GCCTTCGAAC TGTATGAGCA

T T T T T G G C C T GTACTCTAAA T T T C T T T G G

TTATTTAAAG TTGATCATCT TATTTGTACA AGCAAAACAT

GCAGATTCTC CGGGGTCTAG AACGTGGTCC CCAGCCTGTT TAGGTGTCTG GCCTAATCTC AACTTCGGCC TCGTAGTCCA ATGCCTGGAC

analysis

The nucleotide sequence was determined by the chain termination technique of Sanger et al. (1977) by using [35S]thio-dATP as a radioactive label and the Sequenase kit (USBC) according to the manufacturer’s recommendation. The sequence evaluation was carried out by using a number of different computer programs (Corpet, 1988; Fristensky et a/., 1982; Larson and Messing, 1983; MacMolly, SoftGene, Berlin). RESULTS The sequencing strategy is documented in Fig. 1. The sequence was completely determined in both ori-

ATGGGAAAAG

TGTAAAGACT CGTTGTCCGT CATTCTCTTG TCGTGGATCT CCACAATCAC CGACTGAGTC TAGCTGTCAA CTGAGCCGCC GTGGACGGAA ATTGCAGTAT CATTTGGAGA AGATCGGCCG CAGCGGAACT

ACTGAAGTTG AACAGTTAAG AAGATGAACA

FIG. 2. Nucleotide sequences of AbMV, DNAs: (a) DNA A; (b) DNA B. The DNA composition is 24% A, 23% and 25% A, 22% C. 25% G, and 29% T for DNA B. The accession numbers for the EMBL data base (Heidelberg,

Sequence

ACAATTATTT

ATTACTGTTG ATCAAGAAGA

C, 249/o G, and 29% T for DNA A FRG) are Xl 5983 and Xl 5984.

entations. The resulting sequence is given in Figs. 2a and 2b for DNAA (2629 b) and DNA B (2581 b), respectively. The sequences are unique except for a small area of 180 b (99% homology): the so-called “common region” (CR) (Fig. 3). The DNA molecules are circular (Abouzid et al., 1986) and nucleotide 1 is arbitrarily set to the first nucleotide of the CR following the convention for geminiviruses with bipartite genomes (Lazarowitz 1987). The CR contains an inverted repeat which is highly conserved among all geminiviruses and which is a candidate structure for the origin of replication (Fig. 3) (Stanley and Davies, 1985). To analyse the coding capacity of the genome the sequences were searched for open reading frames

ABUTILON

MOSAIC

VIRUS

NUCLEOTIDE

1

SEQUENCE

463

60

AbMVA

CGGTGGCATTTATGTAATAAGAAGGGGTACTCTGGATGAGTTACTCCACTTGAGGCTCCTCAAAACTTGCTCATGlAAllGGAGTATTGG ***t*** *** ********************~*~~~*~~~*~~~~~~~~~~~~*~~~*~~~*~~~~~~~.~~~~*~~~*~~~~~~~~~~

AbMVB

CGGTGGCGTTTTTGTAATAAGAAGGGGTACTCTGGATGAGTTACTCCACTTGAGGCTCCTCAAAACTTGCTCATGTAATTGGAGTATTGG

AbMVA

AGGTCTTTATATACTAGAACTCTCATTAACGGATTTGCAACACGTGGCGGCCATCCGCTATAATATTACCGGATGGCCGCGCGACCCCCC ******************************************************************************************

AbMVB

AGGTCTTTATATACTAGAACTCTCATTAACGGATTTGCAACACGTGGCGGCCATCCGCTATAATATTACCGGATGGCCGCGCGACCCCCC

. . . . . ...-->

120

<..........

180

AT A

A

T

T

A

T T

A c

c G.C

-

160

-

165

-

170

C.G 145s

C.G T.A A.T C.G C.G

140.

G.C G.C C.G G.C

GCAACACGTG

GCGACCCCCC

FIG. 3. (Top) Comparrson of the homologous regton of AbMV, DNA A and DNA B. Arrows rndrcate Inverted repeats, and rdentrcal nucleottdes are aligned. The numbers on top of the sequences refer to the nucleotrde numbers of DNA A as well as DNA 8. (Bottom) Representahon of a potential hairprn loop In the common region. Its calculated free energy value is: dG (25”) = -28 kcal.

(ORF) for proteins larger than 10 kDa assuming that the first ATG is the start point (Fig. 4). In both DNAs the viral as well as the complementary sequences contain ORFs. Potential eukaryotic promoter structures are indicated in the maps (Fig. 4, triangles). Polyadenylation sites are present opposite the common region at the junction of two ORFs from different directions (Fig. 4, P). While in DNA B only one frame is used, in DNA A three frames in the complementary sense contain overlapping ORFs. To facilitate the comparison with other geminiviruses we follow the international accepted nomenclature for viral (V) and complementary (C) sense ORFs in DNA A and DNA B (Fig. 4). The overall genomic organisation is equivalent to those of other geminiviruses with a bipartite genome (Lazarowitz, 1987). So far the sequence contains sufficient regulatory elements to control transcription in the nucleus. Because we found viral DNA in the plastids (Grdning et al., 1987, 1990) we further addressed the question of whether the viral genome possesses the structural capability of expression in a prokaryotic-like genetic system. As an indicator we looked for ribosome-binding sites for 70 S ribosomes (Shine Dalgarno (SD) sequences) in the correct position relative to an open

reading frame, assuming that ATG as well as GTG can serve as start codon in such a system. We found one area which is a good candidate for prokaryotic expression (Fig. 5). An unprecedented, to our knowledge, duplication of the SD sequence is present close to an ATG as well as a GTG. The following ORFs (pro 1 and pro 2) are located in tandem using the same frame as the AC1 (Fig. 4). Looking for potential promoter structures in front of these ORFs we detected a consensus sequence to fscherichia co/i promoters with a considerable score of similarity using the algorithm of Staden (1984) (Fig. 5: -35; -10; $1, underlined). As AbMV is the only geminivirus to date whose DNA has been detected in the plastids, we compared the sequences of several group members in the putative prokaryotic region. None of the geminiviruses with closely related AC1 ORFs (BCTV, BGMV, CLV, and TGMV) possesses all the prokaryotic characteristics presented here. The closest related, in this respect, BCTV, is homologous in the consensus sequence of the f, co/i promoter and bears a core SD sequence (AGGA) 18 b in front of the ATG of pro 1 but the SD sequence as well as the start codon of pro 2 is deleted. In CLV a SD sequence (AGGAGA) is present 13 b In

464

FRISCHMUTH,

ZIMMAT,

AND

JESKE

ORF mol.ut. nucleotide =====I==========================

I 1, \i \\

P

AbMV-DNA

Vl

28 000

Cl

40

388

c2

14 400

1644

_ 1258

c3

15 900

1508

. 1113

200

- 1110

9 - 1574

(1110-1150): VI ->*** CCCTCATGAATT~ATTTGAATTTTATTGAATGATTTTCCAGTACAT GCGAGTACTTAATTATTTTAAACTTAAAJTAJZTTACTAAAAGGTCATGTA ***<-c3

ml.ut. ORF nucleotide =1===11==1=1==1===11===E========= Vl

29 800

548

- 1318

Cl

32 900

2253

. 1377

P (1320-1380): VI->*** AGTGAATAAATGAATTATTTAAAGTTGATCATCTTATTTGTACAAGCAAAACATACAATTATTT TCACTTATTTACTTAATAAATTTCAACTAGTAGAATAAACATGTTCGTTTTGTATGTT~A ***-z-c1

FIG. 4. The genomic organisation of AbMV, DNA A and DNA B showing the orientation of open reading frames with a coding capacity for proteins with a molecular weight higher than 10,000. (Solid arrows) ORFs with eukaryotic transcription signals; (v) TATA box; (P) polyadenylation signal; (dashed arrows) ORFs with prokaryotic features; (CR) common region (see Fig. 3). The tables summarize nucleotide start and stop positrons as well as coding capacity of the ORFs in viral (V) and complementary sense (C) DNA. Sequences at position P below the maps demonstrate potential polyadenylation signals (underlined) in relation to stop codons (***).

front of a GTG, comparable to the start codon of pro 2, but the promoter consensus sequence as well as the SD sequence of pro 1 is absent. The sequences of BGMV and TGMV do not show any striking similarities to prokatyotic transcription or translation elements. When the deduced protein sequences of various Cl ORFs were aligned, a relatively high amount of diversity was found around the beginning of the pro 1 ORF, although the rest of the sequences are highly conserved (Fig. 6). It is noteworthy that in the monopartite geminivi-

ruses with monocotyledon hosts an homologous protein sequence to Cl is split into two ORFs in different frames at exactly this locus (Donson et al., 1987; MacDowell et a/., 1985; Mullineaux er al., 1984). The start codon of the second ORF of WDV is GTG, which is functional in prokaryotes and plastids. Some evidence has been presented that these ORFs are joined after transcription by splicing (Accotto eta/., 1989; Schalk et al., 1989). To test whether any homology exists between AbMV DNA and the chloroplast genome its sequence was

ABUTILON

MOSAIC

VIRUS

NUCLEOTIDE

SEQUENCE

465

. ..GGAGATGTTCMTCTGCCTTAMCA~l~GCMGMUCCC~~~G~CCTTCGGCT -35

(8.

= -31.7)

-10

(8.

= -23.5)

+1 (S.

= -14.3)

CCGGMCCATCGGTTGCCGGTTTCCCCTCTCCTCTTTCACGGCCGTTCCCG~TGC~TGGGCG~T~TTATTTTGGGAGTGGTTCCGCTGCGGCCACACAW METIleILeLeuGlyVslVelProLeuArgProAspArg

AGATAGCAAATCCCC Arg***

FIG. 5. Putative open reading frames with prokaryotic features in DNA A: Shine Dalgarno sequences (double UnderlIne)and an E. co/; promoter consensus sequence (single underllne). The score numbers (s.) refer to the Staden (1984) algorithm for slmllarity for a given sequence with the consensus Of E. CO/i promoters. For location of the sequence, see Fig. 4, pro 1 and pro 2; DNA sequence 1s the reverse complement of nucleotides 2226 to 1429; (***) stop codons.

compared with that of tobacco chloroplast DNA (Shinozaki et a/., 1986). No extensive homology was found in computer searches but one small stretch of similarity was detected (Fig. 7). Although it is difficult to assess the significance of such a small sequence in the 150 kb of chloroplast DNA, we document it here because it is the most conserved sequence in all geminiviruses and is located in the viral hairpin loop, which is thought to be the origin of replication (Fig. 3). Moreover, part of this sequence is the conserved motif at the origin of replication of adenoviruses (Graham et al., 1989). The corresponding plastidal sequence is located in the intron of tRNArYS gene which contains an unidentified open reading frame (Sugita et a/., 1985). DISCUSSION Abutilon mosaic virus is one of the classical examples of early virology (Baur, 1906). By sequencing we have confirmed that its original West Indian isolate AbMV, (Hertzsch, 1928; Regel, 1875) is a member of the geminivirus group. It is a close relative of bean golden mosaic virus and tomato golden mosaic virus, both

from Latin America, as deduced from sequence comparisons (Howarth and Vandemark, 1989). The infectivity of the clones analysed here was proven by the technique of agroinfection (Grimsley et a/., 1986) on Nicotiana benthamiana, Nicotiana clevelandii, Nicoriana tabacum var. samsun, Nicotiana tabacum var. xanthi, and Malva parviflora (data not shown). We conclude therefore that the sequence presented here contains the essential genetic structures for replication and spread throughout the plant. Geminivirus particles are localized in the nucleus (Abouzid et a/., 1988a; Harrison, 1985). Their DNA is transcribed to polyadenylated RNA (Accotto et a/., 1989; Kallender et a/., 1988; Schalk et a/., 1989; Townsend et a/., 1985) and most promoter structures are typical for nuclear transcription. Comparable transcription was found for AbMV in S 1 nuclease protection and primer extension experiments (Frischmuth, S., unpublished data). We found viral ssDNA (Groning eta/., 1987) and replicative dsDNA (Grijning et al., 1990) in the plastids. Both genomic DNAs are present in these organelles

466

FRISCHMUTH,

ZIMMAT,

AND

JESKE

AB

MPPP-KKFRVOAKNYFLTYPOCSLTKDEALSOLONLETPVNKKF~KICRELHENGEPHLHVLlOFEGKYOCTNNRFFDLVSPTRSAHFHPNIDGAKSSSDVKSY~DKDGDT

BG

MPPP-ORFRVOSKNYFLTYPRCTlPKEEALSOLOKIHTTlNKKFlKVCEERHDNGEPHLHAL~OFEGKFICTNKRLFDLVSTTRSAHFHPN~OGAKSSSDVKEYIDKDGVT

TG

MPS,HLKRFOlNAKNYFLTYPOCSLSKEESLSOLOALNTPINKKF~K~CRELHEDGOPHLHVLIOFEGKYCCONORFFDLVSPTRSAHFHPNlORAKSSSDVKTYIDKDGDT

BC

MPPT-KRFRIOAKNIFLTYPOCSLSKEEALEO~OR~OLSSNKKY~KlARELHEDGOPHLHVLLOLEGKVOITNlRLFDLVSPTRSAHFHPNIORAKSSSDVKSYVDKDGDT

CL

MRTP--RFRIOAKNVFLTYPKCSlPKEHLLSFlOTLSLOSNPKFlKICRELHONGEPHLHAL~OFEGK~TITNNRLFDCVHPSCSTSFHPN~OGAKSSSDVKSYLDKDGDT

AB

AEUGEFOIDGRSARGGOOTANDSYAKALNAGDVOSALNILLER~FAKAPEPUVAGFP-SPLSRPFPRRCRSGR~lILG-WPLRPDRPLSLIV

BG

lEUGOFOVDGRSARGGOOSANDSYAKALNADSlESALTlLKEEOPKDYVLONHN~RSNLERlFFKVPEPWVPPFPLSSFVNlP~~OD~-DDYFGRGSAARPERPlSlIV

TG

LVWGEFOVDGRSARGGCOTSNDAAAEALNASSKEEALO~IREKIPEKYLFOFHNLNSNLDRIFDKTPEPULPPFHVSSFTN~PDE~ROU-AENYFGKS~MRPERP~SIII

BC

lEUGEFOlDGRSARGGOOTANDSYAKALNATSLDOALOlLKEEOPKDYFLOHHNLLNNAOKIFORPPDPUTPLFPLSSFTNVPEE~OEU-ADAYFGVDAAARPLRYNSIIV

CL

VEUGOFOIDGRSARGGOOSANDAYAKALNSGSKSEALNVIRELVPKDFVLOFHNLNSNLDRlFOEPPAPYVSPFPCSSFDOVPVEIEEWVADNV--RDSAARPURPNSIVI

III1

III1

II

llllllll

II

II

I

llllllll

I lllllll

I

III

I

lllll

llllll

I

llllll III

IIIIIIIIIIIIIIIIIIIIIII

I

llllllll

lllll

II

II

I

II

I I I lllllll I I I III

I II

lllllll

I IIII

lllllll

IlIIIIIIIIIIll

I III

IIII

III

IIIIIIIIIIIIIIIIIIII

I I I lllll

I III I III1

I- lllll

IIIIIlIIIIl

lllllllI

I I IIIIIIIIIlIIIIIIIiIIIIIIII

I I III

III

III1

I I

I

llllll

I

llllll

I

I IIIIII llllll

lllllllllI

llllll

I ====>

Ill

II

llllllllll

IIIIIIIIIIIIII IIIII III

II IIIIIIII

III

IIIIIIIIIII

II

IIIIIIIIIIIII

II III

IIIIIIIIIIIIIIIIIIIIIIIII

IllllIIllI I IIII llllll

I I

IIII

I

I

I

II

I

I

I

I lllll III

I

I

IIIIIIIIIIIIII

I III

III

II

I.

lllll

lllll

ii

I

I

i II

I I

llllll

I

iiiiili

lllll

ii III

II

I

III

IIIIIIIlIIII

I I

iii

ii11

III1

II I II

I

I III1

III

I Ill

I

I I I

MS

IjDGFCIOSSDERSRKOSLYl

UD

BGRLFO-ESPGRHK-SIYI

II

AB

EGDSRTGKTMUARALGPHNYLSGHLDFNGRVYSNEVEYNVlDDVAPHYLKLKHUKELLGAOKD~SNCKLAKP--VOIKGGIRAIVLCNPGEGSSYKEYLDKEE-NRG...

BG

EGDSRTGKTMUARALGPHNYLSGHLDFNSL~SNSVEYNVlDDlTPNYLKLKDUKELIGEOKD~SNCKYGKP--VOIKGGIPSIVLCNPGEGSSYKDFLNKEEK-PA...

TG

EGDSRTGKTMUARSLGPHNYLSGHLDLNSRVYSNKVEYNVlDDVTPOYLKLKHUKEL~GAORD~TNCKYGKP--VOlKGGIPSIVLCNPGEGASYKVFLDKEE-NTP...

BC

EGDSRTGKTMUARSLGAHNYlTGHLDFSPRTYYDEVEYNVlDD~PTYLKMKHUKHLIGAOKE~TNLKYGKPR-V-lKGGIPClILCNPGPESSYOOFLEKPE-NEA...

CL

EGDSRTGKTIUARSLGPHNYLCGHLDLSPKVFNNMUYNVlDD~PHYLK~-HFKEFMGSORDVPSNTKYGKP--VOIKGGIPTIFLCNPGPTSSYKEFLA-EEKOEA...

MS

VGPTRTGKSTUARSLGVHNYWPNNVDWSS--YNEDAIYNIVDD--lPFKFCPCUKOLVGCORDFIVNPKYGKKKKVOKKSK-PTIILANSD-EDWMKE--MTPGOLEY...

UD

CGPTRTGKTSUARSLGTHNYYNSLVDFTT--YDVNAKYNI~DD~~IPFKFTPNUKCFVGAORDFTVNPKYGK-RKV-~RGGIPCl~LVNPD~EDULKD--MTPEOSDY...

IIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIII

lIIIIIlIIlII

IIIIIIIIIIIIIIII lllllllll I lllllll

III llllll

III1

III1

III

II IIII

III1

III1

II

II

lllllllll

II I I

III I III

III1

lllll I III

llllllllI

II III

I lllll

llllllll

I I

lllllllll llllll

lllllllI

III

llllll IIII

I III lllll

I I

II II

I llllllllI

IIII I I

I lllll

I II

Ill1 II

II

II lllllll lllll I lllll

lllllll

I llllll

IIII

lllllll

I lllll

I llllll III II

I I II

IIIIIIIIIIIII

IIIIIIIIIIIIIIIIII

Illll

I

I llllll lllll IllIIIllII

I III III

II

III

II

II

I I I

III

II

I

II

II I

III1

I

FIG. 6. Comparison of the Cl ORFs of geminiviruses with dicot hosts (AB; BC; BG; CL; TG) and the C2 of those with monocot hosts (MS; WD). The arrow above ,413 indicates the beginning of the prokaryotic-like ORF pro 1 of AbMV. Possible internal start codons near this position in related geminiviruses are underlined. (AB; AbMV,; BC; beet curly top virus; BG; bean golden mosaic virus; TG, tomato golden mosaic virus; CL; cassava latent virus; MS; maize streak virus; WD; wheat dwarf virus). Sequences of the compared viruses were translated from nucleotide sequences of Hamilton eta/. (1984) Howarth eta/. (1985) MacDowell eta/. (1985) Mullineaux eta/. (1984) Stanley and Gay (1983) and Stanley eta/. (1986).

(Grbning et a/., 1990). Here we document putative genomic structures which fit with a prokaryotic-like genome organisation. Experiments are in progress to determine under which conditions the prokaryotic ORFs are expressed. We assume that AbM.V has the capability to enter the nucleus as well as plastids. Although there is a long controversy over whether other plant viruses can enter plastids (Reinero and Beachy, 1986; Rochon and Siegel, 1984; Shalla et al., 1975; Siegel, 1971) evidence was recently presented that TMV is able to do so (Schoelz and Zaitlin, 1989). TMV RNA was translated in vitro in E. co/i and in plastid lysates (Camerino et al.,

1982; Glover and Wilson, 1982). For a geminivirus it was shown that the promoter of the coat protein is active in E. co/i (Petty et al., 1986). Geminate particles are mostly localized in the bundle area (Abouzid et a/., 1988a; Harrison, 1985) whereas virus-like structures in,AbMV-infected Malva pardora plants were found in the plastids of palisade and spongy parenchyma (Jeske and Werz, 1980). In studies on isolated plastids (Groning et a/., 1987, .1990) and isolated nuclei (Abouzid, 1988) the organelles from different tissues of the leaf were pooled and their origins could not be discriminated. In situ hybridization experiments (Horns, T., unpublished data) showed that in (TGMV)

ABUTILON PT

CTCTTTTTTTTGG--MGATCCCCTATAATAATGGATTTCTGC

AB

III IIIIIIIIIIIII I I III CM-CA--CGTGGCGGCCATCCGCTATAATATT--ACCGGATGGCCGC

TG

GGG-CA--CGTGGCGGCCATCCGTT-TAATATT--ACCGGATGGCCGC

BC

CM-CTTTCATAAGGGCCATCCGTTATAATATT--ACCGGATGGCC-C

CL

GAA-CACCCMGG-GGCCAACCG-TATMTATT--ACCGGTTGGCCCC

BG

CATACA--CGTGGCGGCCATCCGATATMTATT--ACCGGATGGCCGC

II IIIIIIIIIIIIIII I III I II AD

I I II

I lllllll

IIIIllIIIII lllll

I II lllll

VIRUS

I II

IIIIIIIIIIIII

lllllll

IIIIIIIIIII

III lllllllll

lllll

III llllllllI llllll

MOSAIC

lllll

I

lllll

I

lllll

I

I III

ATMTA-I--ACC

FIG. 7. Comparison of the conserved hairpin loop structure of vartous gemIniviruses with a part of chloroplast DNA from tobacco (nucleotlde 2376 -242 1; Shlnozakt et al., 1986) and the consensus sequence for the orlgln of replicatlve of adenoviruses (Graham er al., 1989). GemInIvirus sequences were taken from the literature cited in Fig. 6. AD; adenovlrus; AB. BC, BG, CL, and TG are as tn Fig. 6.

infected Abutilon sellovianum plants viral DNA is limited to nuclei and plastids of the phloem. ACKNOWLEDGMENTS We thank Prof. Dr. W. 0. Abel and Dr. D. Evans for helpful dlscussions, Prof. Dr. B. Wittlg (SoftGene, Berlln)for providing the MacMolly computer program, and R Schmidt for help In preparation and typing of the English text. This work was supported by a grant from the Deutsche Forschungsgemelnschaft (Je 1 16/6) and the Bundesminlster fiir Forschung und Technologle (BCT 507).

REFERENCES AEOUZID, A. M (1988). “lsolierung, Charakterislerung und Eigenschaften des AbutlIon Mosaik Virus,” Dissertation Fachbereich Biologie. Hamburg. ABOUZID, A M , BARTH, A., and JESKE, H. (1988a). lmmunogold labeling of the abutilon mosaic virus in ultrathln sections of epoxy resin embedded leaf tissue. /. Ulfrastruct. Res. 99,39-47. ABOUZID. A. M., FRISCHMUIH, T., and JESKE. H. (198813). A putative replicative form of the abutilon mosaic virus (gemini group) in a chromatln-like structure. MGG, MO/. Gen. Gener. 212, 252-258. ABOUZID, A. M and JESKE, H. (1986). The purification and characterization of gemin particles from abutllon mosaic virus infected Malvaceae. 1. Phytoparhol. 115, 344-353. Accorro, G., DONSON, J., and MULLINEAUX, P. (1989). Mapplng of Digitana streak virus transcripts reveals different RNA species from the same transcrlptlon unit. EMBO J. 8, 1033-l 039. BAUR. E. (1906). Gber die lnfektiijse Chlorose der Malvaceen. Kg/. Preuss. Akad Wiss. 1, 1 l-29 CAMERINO, G., SAY, A., and CIFERRI, 0. (1982). A chloroplast system capable of translating heterologous mRNAs. FfBS Lerr. 150, 9498.

CORPET, F. (1988). MultIpIe sequence alignment with hierarchical clustering. lvucleic Aods Res. 16, 10,881-l 0,890. DONSON, J.. Accorro, G., BOULTON, M., MULLINEAUX. P., and DAVIES, J. (1987). The nucleotlde sequence of a geminivirus from Digitaria sangumalis. Virology 161, 160 169.

NUCLEOTIDE

SEQUENCE

467

FRISTENSKY, B., LIS, J., and Wu, R. (1982). Portable microcomputer software for nucleotlde sequence analysis. Nucleic Acids Res 10, 6451-6463. GLOVER. J., and WILSON, T. (1982). Efficient translation of the coat protein clstron of tobacco mosaic virus In a cell-free system from Eschenchia coil. Eur. I &o&em. 122, 485 492. GRAHAM, F.. RUDY, J., and BRINKLEY, P (1989). lnfectlousclrcular DNA of human adenovlrus type 5: Regeneration of viral DNA termlnl from molecules lacking terminal sequences. fMB0 /. 8, 2077 2085. GRIMSLEY, N., HOHN, B., HOHN, T , and WALDEN, H (1986) “AgroInfection”, an alternatlve route for viral Infection of plants by using Ti plasmid. Proc. Nat/. Acad. SC;. USA 83, 3282-3286. GRBNING, B. R., ABOUZID, A. M., and JESKE, H. (1987). Single-stranded DNA from abutilon mosaic virus (AbMV) IS present In the plastids of Infected Abutllon sellovlanum Proc Nar/ Acad SC;. USA 84, 8996-9000. GRSNING, B. R., FRISCHMUTH, T., and JESKE, H. (1990). Replicatlve form DNA of abutilon mosaic virus IS present in plastlds. Mol. Gen. Gener. 220, 485. 488. HAMILTON, W. D., STEIN, V. E.. Couns. R H A, and BUCK, K W. (1984). Complete nucleotlde sequence of the Infectious cloned DNA components of tomato golden mosaic virus. Potential coding regions and regulatory sequences. fMBO/. 3, 2 197. 2205. HARRISON, B. D (1985). Advances in gemInIvirus research Amu Rev. Phyroparhoi. 23, 55.--82. HERTZSCH, W. (1928). Beltrgge zur tnfektlijsen Chlorose. Z. Bor. 20, 65-80. HOWARTH. A., CATON, J.. BOSSERT. M.. and GOODMAN, R. (1985). Nucleotlde sequence of bean golden mosaic vtrus and a model for gene regulation. Proc. Nat/. Acad. So. USA 82, 3572 -3576. HOWARTH, A. J , and VANDEMARK, G. J (1989). Phylogeny of geminIvIm ruses. 1. Gen. I/iroi. 70, 27 17 2727 JESKE, H., and WERZ, G. (1980). Cytochemlcal characterization of plastldal inclusions In Am/ion mosaic-Infected Malva parviflora mesophyll cells. I/iroiogy 106, 155 158 KALILENDER, H.. Ptrrv, I. T. D STEIN, V. E PANICO. M BLENCH, I P., ETIENNE, A. T., MORRIS, H. R., Courrs, R H A., and BUCK, K. W. (1988). Identification of the coat protein gene of tomato golden mosaic virus. /. Gen. I.&o/. 69, 135 1 1357. LARSON, R., and MESSING, J (1983). Apple /I software for Ml 3 shotgun DNA sequencing. Nucle/c Aods Res. 10, 39 -49. LAZAROWITZ, S. G (1987). The molecular characterization of gemlrllv ruses. Plant Mol. B/o/. Rep. 4, 1 77 192 MACDOWELL, S., MACDONALD, H HAMILTON, W., Courts, R., and BUCK, K. (1985). The nucleotide sequence of cloned wheat dwarf virus DNA. EMBO/ 4, 2173&2180. MULLINFAUX, M., DONSON, J.. MORRIS-KRSINICH, B., BOULTON, M.. and DAVIES, J. (1984). The nucleotlde sequence of maize streak virus DNA fMBO/ 3,3063-3068. PET‘Y, I., COUTTS, R., and BUCK, K (1986). GeminIvIrus coat protein gene promoter sequences can function In Eschenchia ~011. Nucleic Acids Res. 14, 5 1 13 REGEL, E. (1875) “Gartenflora.” Erlangen. REIN~RO, A and BEACHY, R (1986). Association ot TMV coat protein with chloroplast membranes In virus-infected ieaves. Plant MO/. B/o/. 6, 291 301. ROCHON, D and SIEGEL, A. (1984). Chloroplast DNA transcripts are encapsidated by tobacco mosatc virus coat protein. Proc. Nat/. Acad.%. USA81, 1719-1723 SANGFR. F., NICKLEN, S , and COUL.SON. A R (1977) DNAsequencing with chatn-terminattng Inhlbltors. Pm; Nat/ Acad. SC;. USA 74, 5463 5467

468

FRISCHMUTH,

H.-J., MATZEIT, V., SCHILLER, B., SCHELL, J., and GRONENBORN, B. (1989). Wheat dwarf virus, a geminivirus of gramlnaceous plants needs splicing for replication. fMBO/. 8, 359-364. SCHOELZ, J., and ZAITLIN, M. (1989). Tobacco mosaic virus RNA enters chloroplasts in vivo. Proc. Nat/. Acad. SC;. USA 86, 44964500. SHALLA, T., PETERSEN. L., and GUINCHEDI, L. (1975). Partial characterization of virus-like particles in chloroplasts of plants infected with the U5 strain of TMV. Virology66, 94-l 05. SHINOZAKI, K., OHME, M., TANAKA, M., WAKASUGI, T., HAYASHIDA, N., MATSUBAYASHI, T., &ITA. N., CHUNWONGSE, J., OBOKATA, J., YAMAGUCHCSHINOZAKI, K., OHTO, C., TORAZAWA, K.. MENG, B., SUGITA, M., DENO, H., KAMOGASHIRA, T., YAMADA, K., KUSUDA, J., TAKAIWA, F., KATO, A., TOHDOH, N., SHIMADA. H., and SUGIURA, M. (1986). The complete nucleotide sequence of the tobacco chloroplast genome: Its gene organization and expression. EA&?O /. 5, 20432049. SIEGEL, A. (1971). Pseudovirions of tobacco mosaic virus. Virology 46,50-59.

SCHALK,

ZIMMAT,

AND

JESKE

STADEN, R. (1984). Measurement of the effects encoding for has on a DNA sequence and their use for finding genes. Acids Res. 12, 551-567. STANLEY, J., and DAVIES, J. (1985). Structure and function of genome of geminiviruses. In (J. Davies, Ed.), “Molecular rology” Vol II, pp. 191-2 18. CRC Press, Boca Raton, FL.

a protein Nucleic the DNA Plant Vi-

STANLEY. J., and GAY, M. (1983). Nucleotide sequence of cassava latent virus DNA. Nature (London) 301, 260-262. STANLEY, J., MARKHAM, P., CALLIS, R., and PINNER, M. (1986). The nucleotide sequence of an infectious clone of the gemini virus beet curly top virus. EI1/1BO/. 5, 1761-l 767. SUGITA, M., SHINOZAKI, K., and SUGIURA, M. (1985). Tobacco chloroplast tRNA lys (UUU) gene contains a 2.5.kilobase-pair intron: An open reading frame and a conserved boundary sequence in the intron. Proc. Nat/. Acad. SC;. USA 82, 3557-3561. TOWNSEND, R., STANLEY, J., CURSON, S. J., and SHORT, M. N. (1985). Major polyadenylated transcripts of cassava latent virus and location of the gene encoding coat protein. E/!&30/. 4,33-37.