The complete nucleotide sequence of pepper mottle virus genomic RNA: Comparison of the encoded polyprotein with those of other sequenced potyviruses

The complete nucleotide sequence of pepper mottle virus genomic RNA: Comparison of the encoded polyprotein with those of other sequenced potyviruses

VIROLOGY 191, 1 g-30 (19%) The Complete Nucleotide Sequence of Pepper Mottle Virus Genomic RNA: Comparison the Encoded Polyprotein with Those of O...

1MB Sizes 0 Downloads 12 Views

VIROLOGY

191,

1 g-30

(19%)

The Complete Nucleotide Sequence of Pepper Mottle Virus Genomic RNA: Comparison the Encoded Polyprotein with Those of Other Sequenced Potyviruses

of

VICKI BOWMAN VANCE,*,’ DELORES MOORE,* THOMAS H. TURPEN,t ALlAN BRACKER,+ AND VICTORIA C. HOLLOWELL* ‘Department of Biological Sciences. University Vacaville, California 95688; and *Lawrence

of South Carolina. Columbia, South Carolina 29208; tBiosource Genetics Corporation, Berkeley Laboratory, University of California at Berkeley, Berkeley, California 94720

Received May 7, 1992; accepted July 6, 1992 The complete nucleotide sequence of a pepper mottle virus isolate from California (PepMoV C) has been determined from cloned viral cDNAs. The PepMoV C genomic RNA is 9640 nucleotides excluding the poly(A) tail and contains a long open reading frame starting at nucleotide 168 and potentially encoding a polyprotein of 3068 amino acids. Comparison of the PepMoV C presumptive polyprotein with those of other sequenced members of the potyvirus group, including tobacco etch virus (TEV), tobacco vein mottling virus (TVMV), plum pox virus (PPV), and potato virus Y (PVY), allowed localization of putative protein cleavage sites. A similar analysis was used to determine the position of conserved viral protein-coding regions along the viral genomic RNA. These analyses confirm previous work indicating that genome organization is conserved among members of the genus Potyvirus. The localization of one PepMoV C gene product, the nuclear inclusion body protein a (Nla protein), was analyzed by expressing PepMoV cDNA deletion clones in bacteria and assaying for appearance of mature-sized coat protein, a cleavage product of the Nla protease. Comparative sequence analyses of the putative PepMoV polyprotein with those of TEV, TVMV, PPV, and PVY served to identify regions of the potyviral polyproteins which have diverged within the genus, as well as highly conserved protein features which may play an important functional role in the potyviral life cycle. 0 1992 Academic Press. IW.

ing of potyviral genome structure, as well as suggested functions for several mature viral proteins. To date, the complete nucleotide sequences of four potyviruses, including tobacco etch virus (TEV), tobacco vein mottling virus (NMV), plum pox virus (PPV), and the N strain of PVY, have been reported (Allison et a/., 1986; Domier et al., 1986; Maiss et al., 1989; and Robaglia et a/., 1989, respectively). For each of these potyviral species, the sequence studies reveal a single long open reading frame encoding a large polyprotein punctuated at conserved locations with protein cleavage signals which are similar but not identical among the various potyviral species. These sequence data are consistent with the previously deduced finding that mature potyviral proteins are expressed via proteolytic processing of a virally encoded polyprotein (for review see Dougherty and Carrington, 1988). Sequence similarity among the viral polyproteins indicates a common organization of the eight known potyviral genes along the viral genomic RNA. Furthermore, a number of protein motifs are found to be conserved among the sequenced viruses, suggesting that these sequences serve an important function shared by each of the potyviral isolates. Here we report the complete nucleotide sequence of the genomic RNA of PepMoV C. The PepMoV genomic RNA, like those of the four other sequenced potyviruses, contains a long open reading frame that poten-

INTRODUCTION Pepper mottle virus (PepMoV), a member of the genus Potyvirus, was originally described in 1972 as an atypical isolate of potato virus Y (PVY, Nelson and Wheeler, 1972; Zitter, 1972) and later reclassified as a distinct virus species based on classical studies of host range, cytopathology of infection, and serology of coat and cytoplasmic inclusion body proteins (Abdalla eta/., 1991; deMejia et al,, 1985; Nelson and Wheeler, 1978; Purcifull et a/., 1973, 1975). We have recently reported the deduced amino acid sequence of the coat protein of a potyviral isolate from California which was definitively identified as PepMoV (PepMoV C) by a variety of classical methods (Vance eta/., 1992). Comparative sequence analyses of the 3’ untranslated region of the genomic RNA and of the deduced coat protein amino acid sequence of this strain of PepMoV with those of PVY and other related potyviruses confirm the earlier serological and biological evidence that PepMoV and PVY are distinct viruses. Analyses of complete genomic RNA sequence for individual potyviruses has advanced our understand-

The nucleotide sequence data reported in this paper have been submitted to the GenBank nucleotide sequence database and have been assigned the Accession Number M96425. ’ To whom reprint requests should be addressed. 19

0042.6822/92

$5.00

Copyright 0 1992 by Academic Press. Inc. All rlghis of reproductton I” any form reserved.

VANCE

20

tially encodes a polyprotein. Analyses of the deduced amino acid sequence of the PepMoV-encoded polyprotein and comparison with those of other sequenced potyviruses allowed us to assign proteolytic cleavage sites and locate genes on the PepMoV genomic RNA. PepMoV genome organization is similar to those of other potyviruses and overall homology of the PepMoV polyprotein to those of PVY, TEV, TVMV, and PPV is relatively high. However, several regions within the encoded polyproteins have diverged significantly within the genus Poryvirus and may reflect basic functional differences between these individual viral species.

METHODS PepMoV C isolate The isolate of PepMoV used in this work was originally collected in 1974 from field-grown pepper in California and identified at that time as PVY. The original collection was by Dr. A. 0. Paulus, Department of Plant Pathology, University of California, Riverside, California, and the isolate was subsequently propagated by Dr. L. G. Weathers, (University of California, Riverside) by single aphid transmission. The RNA genome of this putative PVY isolate was molecularly cloned and a partial sequence (5’ and 3’ untranslated regions) was published in 1989 (Turpen, 1989). Sequence data reported here was derived from the same cDNA clones reported at that time. A number of procedures were used to definitively determine that this potyviral isolate is not PVY, but a strain of PepMoV and these results are described elsewhere (Vance et al., 1992).

cDNA cloning The preparation of purified virus, viral genomic RNA, cDNA clones, and deletion clones has been described (Turpen, 1989).

Nucleic acid sequencing The majority of the nucleotide sequence of PepMoV C was determined from deletion subclones of two overlapping cDNAs which represent nearly the entire PepMoV C genomic RNA (Turpen, 1989). The 5’end of the genomic RNA is represented by clone pZ0177, which is 1.4 kb and contains all but 17 nucleotides from the 5’ end (Turpen, 1989). The second cDNA, pZO171, is 8.5 kb and overlaps pZO177 by several hundred nucleotides. This cDNA does not contain a poly(A) tract and lacks three terminal nucleotides (Turpen, 1989). Some sequence was obtained from a 5.9kb cDNA (clone pZO174) which overlaps the 3’ end of pZ0 17 1, but includes a poly(A) tract of 11 nucleotides. Six independent clones containing the 3’ end of the

ET AL.

PepMoV genome were sequenced and contained poly(A) tracts of 1, 8, 8, 1 1, and 52 nucleotides (Turpen, 1989). The original cDNAs are representative of the virion RNA as determined by comparing restriction maps of viral cDNA before and after cloning (Turpen, 1989). The nucleotide sequence of the deletion subclones was determined by dideoxynucleotide chain termination reactions on denatured double-stranded templates using cloned T, DNA polymerase exactly as described by the manufacturer (U.S. Biochemical Cleveland, Ohio). Internal sequencing primers were synthesized and used as needed to complete sequence determination for both strands of the cDNA along their entire length. A portion of the nucleotide sequence (approximately 3000 nucleotides) was obtained commercially from Lark Sequencing Technologies, Inc. (Houston, TX). The 5’terminal 17 nucleotides were not represented on any cDNA clone and were determined by sequencing directly from the viral RNA (VPg-linked) template (Turpen, 1989). In some cases, the VPg-linked template was pretreated with proteinase K before sequencing. The nucleotide sequence of the 5’ and 3’ untranslated regions of the PepMoV genome have been reported (Turpen, 1989).

Western blot analyses Protein extracts from bacteria expressing PepMoV cDNA clones were analyzed for the presence of PepMoV coat protein sequences by Western blot analyses. Bacteria were pelleted, lysed in a solution of 2% sodium dodecyl sulfate (SDS), 5% P-mercaptoethanol, 10% glycerin, and 62.5 mM Tris-phosphate, pH 6.7. Samples from the various deletion subclones containing equal amounts of protein as assayed by Bradford analyses (Bradford, 1976) were separated by SDS-polyacrylamide gel electrophoresis (Bruening et al., 1976) and the gel electroblotted and the protein blot processed exactly as previously described (Vance, 1991) using polyclonal rabbit antiserum specific for PepMoV coat protein as the primary antibody.

Computer analyses All the computer analyses were performed using the Sequence Analysis Software Package (GCG package, version 7) by Genetics Computer, Inc. (Madison, WI). The alignment of potyviral polyprotein amino acid sequences was created using the program “Pileup” which uses a simplification of the progressive alignment method of Feng and Doolittle (1987). The program “Plot Similarity” was used to calculate the average similarity among members of the alignment specified by “Pileup” at each position in the alignment.

PepMoV

1 81 161 241

AAAvmAAAcAvAAcAOAcAAcAV AVrnWAAGcA

II

c

0mcvGcccAAAA

SEQUENCE

AAAwaAAGcAAvcAAAwvvcAAGcA

21

wwvcvvGAGwmGAwwvccvw

80

Cvw~v~~v~vcvc~~~~~~w~~wwcvwM~~vc~

MATSVI VACO~VGGCAACCAGVGWAVV

v

GENOME

P

x

c2G

160

PGSTVCNLPKS wvGGvvcAvwGvGoGcAAvcwccAAAGvc

OAVGCACAACVG

SWSTNIVRPSDPFAELEKBLE c2 OAGCADOAGCACDV~V~W~V~

-GcAvcwGAA

320

TKGGTLVYRHYSEA c%AA CVAAAGGCGGGACGCWGVGVACAMClKAV-

321

P Y L CCAVACW

401

KRARKLRKK ~-PREEEE’RLIYN~~V~~VG

481

VSNITIGGGEVPSK#EEVSIKRPLNKT vGAGVAAcAvcAcAAv-

561

PSRKIKKSLTPVTFRDGBMNKFLRELR UUAAGAAAOCGCOCACACCAGVGACCWOAGGGAC CCWCVCGGA&GA

641

D C A T R N S W T V GG&WGVGCVACCCGVAACAWAVGXAWACACWGAW-

721

SLNAVYATLBHMRGVDRKRDIVLEEWW WWAAAVGCCGOAVACGCVACACVGCACCAVAVGCWGGAGWG%OCGCAAGCWGAVAWWGCVC

801

NDYVLNLSKVSTWGSLFBAESLKRGDS CAAAAGWAGCACAVGGGGOVCACVAWVCACGCCGAGVCWVAAAGCG&GGVGAVAG AACGAWAVGWCVAAAVCVAO

881

G L I L N A R A L VGGGWGAVACVGAkVGCVAG%CACVGAGAGG

961

240

400 480

GGVGCCCVCVAAAAVGGAGGAAGVGOCCAV

R

H

G

L

I

G

10 4 1 1121

SNCGEVAAII wcAAAcV-

1201

KEICDLTSNECV AAGAGAVCVGCGAVCVCACOVCCAAVGAWWGO

1281

NLHPEF AAVCVACACCCVGAAVW

560

GGAcAcAoGMcAAGwvwGAGAGAAcwAG

K

R

K

K

F G R C VAAAVVCGGACGAVGVA-

SDGVVLDARSKLSMATVTBME CGG?UJ~WOGVACVAGAVGCAAGAVCCAAGCVWCOAVGGCA?bC AFWSGLEKKWSVVRKPTAHTCKPTYSV GCAVWVGG&GOGGCWAGUUAGAAW

-ccGcvGAAvMAAcv

S

;n&$nMnK

R

720

GAAGAAVGGAVG

800

G&&IxVwR

VGVAACVCAVAVGGAA

640

R R A AAAAGGCWGCOA

880

K

G

-v

Y S T P & WAVVCAACACCVGAA

960 E

GGACCGUCGDCCGCAAGCCAACCGCACAUACO[IGOACVVG~~~~VAW~V

1120

LFPCEKLTCGECS WAVWcCWGCCACAAGWGACGVGVGGOGAAVGCVCGA

GVGGCCGCVAWAV G&A

ELYKNTSLALERWN GAWVAOACAAGAA

1200

VACVVCCOOGGCACVGGAAAGGAOGAAC

1280

LTEASNRGT 0cAcvGAAGcAvccAAvcAvGGGAc

1360

THLNKLNE vCcvwcAcvcAwvMAvAAAcvcAAvGMv 14 4 1

FMLKGNENTSGEWLTAR WAVGVV-

1040

1440

LRELVRF wAAGGGAwvGGvGAGAwv

VGAAAAVACAAGO~WGGWGACVGCGCGA

&A YNL VACAAWV

1520

1521

KNRTDNIKKGDLASFRNKLSARA AAGAACAGAACVGAVAAVAO

1601

YLSCDN WAVWAVCAVGCGAOAA

1681

FL NFF vccvGAAcwcvvc

1761

KLAIGNLIVPLDLAEFRKRYNGIDT AAGVVAGCOAVAGGCAACWA?iWGVOCCACVCGAOWAGCVGAAVVC

1841

PPIGKYCTS ACCACCAADDGGVAAGVACVGVACAAGC

1921

G $A~p~~O~~O~~VG~W~~~p~~~~V~~~~O~~~~~D~O~CA~C GV

2000

2001

NLPKGDTEMLYIALDGYCYINIYLAWL AAcvoCCCoAAAGGAGAc.AcAGAAA

2080

2081

VNISEEEAKDFTKKVRDIFMPKLGKW AGV~~~V~WV~W~~VCOV~O~~~W~O~

2161

PTLWDLATTCA C?bACWVGAOGGAVWGGCVAC.AACAOGVGCV

2241

ILVDHNT AVWVGGVGGAVCACAACAC!.A

2321

AATVS AGCVGCAACOGVWCA

2401

IVENHKV WWAGAGAAVCAVAAAGV

2481

IKGIYRPSVYYELLSEEPYLLVFSILS AVUAAAGGG&VCVACAGGCCAA

LDKNASFLWG cc% GCWGACAAGAAVGCVAGUWOCVGVGGGGV IDPSKGYLAYEDRTIPNGSR VAGACCCAVCAAAAGGO

&A

wcvGcvcwGcA

AAAGAAGGGvGAcvuAGcAvcAwcAGAAAvAAGcv

CL

REYAARRF ct GCGRGAAVACCACGCA'XVCGGWW

VAWVGGCAOAVGAAGAVCGGACGAOACCA?iAVGGVVCVCGG CGAAA?&CGCAOGAAVGGCAVCGACAC

LDGNFVYPCCCTTLDD c2A VOGGAVGGGAAWWGVWAVCCGVGCVGCVWACMCGCWGAVGAOG

VGCVAOAC&WGCACWGAVGGCOAWWOACAWAACAWVAVCVGGCAAVGW

1600 1680 1760 1840 1920

2160 &AC

LRIFHPDVHDAELPR WCGAAVAWCCACCCOGAOGVGCAVGAVGCAGAGCOGCCVCGV

2240

TCHVVDSYGSISTGYHILK CWWCAOGVGGVCGAWCAVAVGGAVCAAVVAWACVGGGVAVCACAVVCVGAA

2320

LVLFADDNLESEIKHYRVGG VUAGVWVGOWGCVGACGACAACVOGGAAVCOG7&AVAA?iGCACOAVAGAWOGGVGGAA

2400

&A

G&A

ID N VAGAOAAC &A

PSRCGVSEFHAIRML CCVAGVAGAVGVGGAGOGAGVGAWOCCAVGCCAVACGVAVWOA

GVGVCAVGOAVGAAWACVCVCVGAA~CCAVACVVGCVAGVGWCVCCAVOCWVC

FIG. 1. The nucleotide sequence of PepMoV C genomic RNA and the deduced sequence reading frame beginning at nucleotide 168.

of the putative polyprotein

2480 2560

encoded by a long open

VANCE ET AL.

22 2561 2641

2721 2801 2861 2 9 61

3041 3121 3201 3291 3361 3441

3521 3601 3681 3161 3841 3921 4001 4081

4161 4241

4321 4401 4401

4561 4641 4121

4801 4861 4961 5041

PSI LIAMM&!&o~RAFRLAV ACCCOCAAOAWGAOAGCGA

OAGGGCooooGAAooAGCoGW

2640

IPLIATILTNLAAK&,

2120 WCCACoGAooCCCACOAoWo~~ LLNVTCDGIRVSPAY S A L T &A OCAGCOCOAAC 2800 GGGOOOCGGGUGAGWWGCWAU AhGtA&AOb~O cl2GCoGCoGAAoGoGACOoGoGAo E Y D LLTRYRD &iM&G G E'N GGAGGGWCAAOGAAOAOGA C&GGkW 2880 IJcoACoCACAAGGAuGCGAGAu WKELSSLEKS'R LAWTLEKNY 2960 cooAcAcGAc AA u-wAAGcwGcoGGAAAAAwocGc OGGCWGGACC YYWS SRK RKTm&w~&~mI K SRS S PVASA$ OCAUA~CGCCCWO3040 OAWAOOGGOCCOCAAGAAAGCGAMGA S S L S L K P F M G K V ~co~S$&#&&A&GoC& Gc&&AA 3120 AOCCAGWOAOCACOGAAACCAoooAOTKNFIDARCLGISTYFVGSLMRKFPSA OWCCOAGOGCG 3200 COAAGAAOWOAWCACOCAAGCOWWGGGCAOWCAACCOAOOOOGOAGGAOCACOCAOGCGCAAA KVLLSSLPVLGALLNITRAANRIIIDN AAAGaAOOCCW~~AOOACWLCOUIJAOOCG[IAOUGCCACCCCU 3280 RISREHAAAL&w;oRKEDTCHELYTAL ~OACWGCCAOGAGWAOACACCGCACOGG OCGCAOWCACGCGAA 3360 ERKLGEKPTWDEYCSYVAKINPAMLKF AGCGCAAC(IOGGGAGAGAAAmCCOGGGACGAWACOGCOCAOAOGOGGCOAAGAOCMOCCOGCAAOGCOGGAAOOC 3440 IKDSYDEK v v I1 &&g&o& K K V E B I I A AOOAAAGACOCAoAVGAoGAAUAcl GGvcGoccAc BWGAACACAOAAOAGC 3520 TVTLAIMLFDSERSDCVFKTLNKFKG AOWWOACACoGGCAAOAAOGCOOOOOGACOCOGAAAGGXOGAOOGOGoAW CAAAACOCOGAAOAAAOWAAGGGOG 3600 VVCSLGSGVRH SLDDFVSTYDEKNFV WWWWOCX oAGGcocAGGAGwAGAcA0 c%wCVwGGAoGAwwwAAGcAcA?L oGGAoGAGAAGAAowcGw 3680 VDFELNDSV LTTEITPESWWD WGGAoo'oCGAAWGAAOGAoAWW ccwacAAcoGAGAocAccow~GcoGGoGGGAo 3760 VARGFTIPRYRTEGRFYEITRATAAK AGVCGCOCGGGGWOCACAAoACCACAVOA-oooAOGGAGooCACAA GAGCAACAGCAGCVAAAG 3840 VASDISISSERDli'LIRGAVGSGKSTGL OCGC~wCAUAaAIICAADCDCADCDGAGCCCCACVUO OGOGGGWCOGGOAAAOCCACOGGAOOA 3920 PRXLSTYGRVLLIEPTRPLAENVFK wACACCACOoG&GCACooACGGuAGAGwwGCoGAoAGAACCGACACGGC CAWAGCAGAA%AOGOWOCAA 4000 G&G& SGGPFFLKPTMRWRGNSVFGSSPISV AOCOGGOGGOCCAOOOW~OAAAAC

CCACAAOGAGAAOGCGOGGOAAOAWWGOOOGGGOCGOCGCCOAOWCOWAA

MTSGFALAFFANNIT

o-w-ocwwo~wocwo~M~o~w~~~~~~~~~~~~o~

4080 4160

CBVMDASSWAFRSLIBTYBTNCKVLKV O~O~O~O~OCW~O~W~~~OO~O~OAC~~~~~~~WW~O 4240 SATPPGREVEFTT FPVKLVVEDSLS GAGAGGOAG?iWOCAC!AACA WCAGCAACACCACCAGGAA CL OOCCCAGOG&AAOOAGOGGWGAAGAOAGCCOAOCW 4320 FKTPVES SNCDMI YGNNLLVYV WAAGACAWVGOCGACAW AGCAAWGOGACAOGAOC CL OACGGAAAOAAWOAWAGOGOAOGoA 4400 ASYNEVD LSKLLVAREFNVTKVDGRT GcvAGooAoAAoGAAGoAGAo cl& COWCAMAVUACOAGoAGCOCWGAAOoCAAOGVCACGAAAGOAG&OGGOAGGAC 4400 YKHGELEIVTRGTKSKPBPVVATNII AAOGAAGCA OGGVGAACoCGAGAOOGOGACACGAGGCA CAAAGAGOAAGCCACAWWGOOGOCGCCACOAAOAWAOOG4560 ENGVTLDIDVVIDFGMKVSPFLDVDNR AGAAOGGAWAACoooAGAOAOAGAOWOGWAWGAOWOGG%iOGAAAGooAGCCCAOWWAGAOWAGAOAAOAGG4640 S V A Y N K VW&&&$&E R I GRVGRI OCOGoAGCAOACAAoAAGGo WGGAAGGGOAGGoCGCAoA GAAcGAAoo c%GAbE 4720 TALRIGHTEKGLIEIP MISTEAALY CACCGCACWCGGAOAGGOCACACOAAOAGAAAOACCO &AA OGAOAOCAACVGAAGCo GCOCOGOAW 4800 CFAYNLPVYSSGVSTSMIKNCTIP GCWOGCGoACAAoooACCAGOCAOGOCOAWGGC!GOCOCCA CAAGCAOGAoOUAAAWGOACAAOA CCAC%A&C& 4000 TMHTFELSPIFWYNFVSHDGTMHPVVH ACAAOGCdOACAWOGAGWGAWCCAOWWCAOGOACAACoWGOGOCACAOGACGGAACAAOGCAOCCXGOOGOCCA 4960 ETLKRYKLRDSVIPLSESSIPYRASS OGAAACOCOCAAGCGCOAOAAAWGCGOGAWCGGooAWCCAOOAAGOG&GAGWCAAOOCCAOACAGAGCWCOAGOG 5040 DWITAGDYRRIGVKLDIPDETRIAFHI ACoGGaoCACAGCOGGOGACOACAGGCGOAWGGAGOGAAkCOGGMlAWCCAGAOGlAA cGcGAAwGCAIJwC 5120 FIG. 1 -Continued

PepMoV

GENOME

SEQUENCE

23

5121

XTFNRXFTNNLWESVLXYXASAAFPTL AAGAcAwccAcacAAA W~~~~GU~~AUM~UCU~~UUC~~W

5201

R S S S I T X GCGAvcAvcAvcAAWAcAMGA

5281

FXSLIDNGCSSMFS ~LGLEDvERTvPwPvv~vMv~wv~GAUAADGCOUCCOCAAG[lAUCOIJCU

5360

5361

VVGISNALRAXYSXDHTVENINXLETV GvGGWGGAAvcvcAAAvGcAcvcAGAGcv

5440

5441

KAQLKEFc~~v~C~v~v~~w~~~~~c~w~~~

5520

5521

SXSSLAXALGLRGVWNXSLIV F v II II &: WVGVGcAVcAc cl GVCCAAGvCWCVCWGcGAAGGCVCWGGA-GWvGGAAVAAAVcAcVcAWGW

5600

5601

RDAIIAAGVACGGAWLLYTWFTAXMSE cccGADGcCADCADUGCCGCCGGDGDDGCAUCOGCUGCUC

5680

5681

v s II AGvGAGvcm

5761

FEIDNNEDTIEEYFGSAYTXXGXGXGT WCAORWGAOAACAAOGAAGAUACOAUOGAAGAGUAC[L

&&gm&c$

5200

S T D L Y A I P R T GAGCACVGAWvGvAVGCAAWccAcGVAcW

L A V V VAGCAGWGVGG

A?iAVAVVCGAAAGAVCAcAcVGVGGAGAAVAV~VAUcWGAAACVGv

LXFRXARDXRAG VVGAAAVV-GAV-VGGAV

5280

5760 GGVAAAGGCACA FSYIXF CL VVCVCWACAVcA?iAW

5840

5921

TVGMGRTNRRFINWYGFEPG ACVGWGGCAVGGGcAGAAcAAAcAGAcGAWvAVcAAcAVGvAVGGGWVGIVZ’XCGGA WEENVYADIVDV VDPLTGA VGWGAVCCGCVCACAGGVGCA &A VGGAGGAAAAVGWVACGcVGAVAvAGvcGAVW

6001

~~~~~~~~~~~~~v~~v~~v~~v~~v~~~~~v~v~W~c~~~

6080

6081

XDWSNXALXVDLTPHNPLRVSDXASAI VVGAAAGVGGACVVGACVCcGcAvAAVCCVCWCGGGVAAGVGAVAA~VAV AAAGAvvGGvcAAAvAAGGcA

6160

6161

AA~~V~C~DPREGELR~~~~~~DG~~~~AC'~

6240

5841

E

X F G D Gc.%AGAAAAGWVGGVGAVA

TV CX L &AA CVWWGCAAGVVG

6241

EVVXHEAXTLWRGLRDYNPIA AAGVAGVG&AGcAcCWGAVGAGGGGcCvVcGvGAWAcAAVCcAAVAGcc

6321

TVXSELGETSTYGLGFGGLIIANHHLF ACVGVAAAAVCCGAACvGGGVGAAAcA VCAACAVAVGGWVAGGWVCGGVGGGWAAWAWGcAAAVcAccAWvGW

5920 6000

6320 6400

6481

XSFNGSLEVXSRHGVFRVPNLMAISV CUGAGCVWAAVGGCAGVCVCGAAGWAAAVcACAVcAVGGGGWWvAGAGVGccUAccVGAVGGcCAVAAGcGVCV R L X F R G R D M I I I X W P X D F P V F P kA$Gvi& &AC GACVCAAAWCAGA -VAVG wvcccAGucwcccA

6561

EPASTDRVCLIGSNF GAGCCVGCCVCAACAGAVAGAGVGVGVCVcAWGGAvcAAAcW

6641

SATHPVPRSTFWXHWISTDDGHCGLP CAGVGCCACGC?LVCCAWCCCA~GCAG~A~AVWV

6721

IVSTTDGFILGLHSLANNRNSENYYTA WGVVAGcAcAAcAGAVGGAWVAVCCVAGGGCVAcAVAGVCVAGcAAAVAAvAGGAAcAGVGAAAAVVAWAcAcVGcV

6800

6801

F D S D F E M X I L R S G E WCGACVCCGAWvVGAAAVGAAAAVAVVGAGGAGvGGAGAUA

6880

6881

TVLWGPL CACAGVVCVAVGGGGA~VCVA

6401

ERYISTTVSEI CCL GAAAGAVAC%WVCVACAAcAGVGVCGGAAAV

GGAAGCAWGGAVCVCVAcAGAVGAVGGvCAWWGGWVGcCVA

Nm&gwW&

X N W X Y N P D AAAGAAWGGAAAVAVAAVCCAGA

LTXGTPSGMFXTTXMIED CL CVCACV-cA~GVGGAAVGWvAAAAcCAcA?iAGAVGAWGAAGAW

~A&A~C

7041

LVTXHVVXGECYMFK XAIAYMXS AAGGCC.AWGCAVAvAVGAF&AGV cl GCVCWCACCAAGCAVWVGVGAAGGGVGAGVWAVGAVGWcAAA

7121

PRANEFF ccccAGGGcAAAvGAAwvw

6640 6720

7040 c% WhJf!

XYWAYGXSMLNXEA

-UGUGGGCWAV

6560

6960

ABTSSWYLEVLXENL GAGGGAAfzLGcA R * CACACAVCAvCWGGAVGCVVGAAWCVv~WVG

6961

6480

CGAAACAGVAVGWGAKVAAGGAAGCCV

7120 7200

7201

YIXDIMXYSXVIDVGVVDCDRHLRXLS AVAV-VAVA%VGAAAVAWcAAAGGWAVVGAVWAGGAW~VCGAWGcGAccGGcAWV-AVcG

7281

LELLYT WAGAGWAVVGVAVACV

7361

NITTAVGAWYGGXXXEYFEXFTTEDX GAAVAVA?iCCAcAGcVWVGGAGcVAVGVAV~

7441

cvGAGAwcvcc

7521

RSXEXIEANXTRTFTAAPIDTLLGGXV AGAaGuAAGGAMA GAVAGAAGccAAVAAGAcACGGACWVcACAGCAGCcccAAWGAVAcACVAWAGGvGGVAAGGV

7600

7601

CVDDLNN FYSXNIECCWTVGMTKFY GVGVGVAGAVGAVCVcAAcAAC c% GUWVAW CGAAAAAVAVVGAAVGWGVVGGAcGGWGG?iAVGAccAAAVVCVAVG

7680

IHGFRXCSYITDEEEIFXAL & GAVCCAcGGcWVCGCAAAVGWCWAcAVcAcAGAVGAAGAGGAGAVAVVcAAGGcAW

7280 7360

WACWVGUAAGWcACAACAGAGGAVAAGG

7440

SCLRLYTGXLGVWEWALXAEL GCVGWVGAGGVVGVAcAfXGGVAAACVGGGvWWGGGAAVGGGCVCVAAAAGcvG?bAcVG

7520

FIG. 1-Continued

VANCE

24

ET AL.

7 681

GGWDKLLTALPAGWIYCDADGS GACAGCwwCCuGWGGAuGGAuAuAWWGA wGGAwGGAuAAGcww

7761

LTPYLINAVLTIRYAPYEDWDIGYKML WCACACCDllADC

9’ D 8 S UGCAGAWGWC OCBAWWAUAGWCA GGAAGAWGGGACAWGGGUAUAAGAUGW

7760 7840

7841

OCP~GYT~EIIYTPAU~C~V~IIF

7920

7921

PSTVVDNSLYVVLAYRYAPVREG ACAGU ~~~~~u~~~~~~u~~w~w~~~~~u~uww~

8000

8001

IATEEIDSICKPFVNGDDLLIAVNPER WcAcuccAUAUw~DCVOCCVllAAUCCAOACGAVU AUAGCAWUGAAGAAA

8080

8081

ESLLDTLSNHFSDLGLNYDFSSRTRN CCACACACUGUCGAAUCAWWUCUGAWUAGGGCU U-UAW

8161

KSELWFYSBCGISVEGTYIPKLEEERI AAu~WwGCOIICAIlCUCACADOCDCGCAmLlCUG

8241

WDRAELPEYRLEAICAAMIESW v s I L ACCAG&AUACAGAWGGAGGCuAUWGu GUAUCAAUUCU c&A wGGAccGAGcAGAGw

8321

G Y P LTXEIRRFYSWLIEKNPYADLA GGGCUACCCAaA WAACwAUGMAWCGAAGAWCUAUAGCwGWAAWGA

8401

SEGKAPYISELALKKLYLW CUGAAGCAAAAGCUCCADACADUDCDGAACDACCUCDAAA~UQUVAPMMSF~

8481

RSYLKYE’ADADEEFECGTYEVRH A~u~U~~U~~~~U~~~~~U~~CW~u

8561

RSDT&~u&u~~KNKEVATVSDGMGK AAGAU-

8641

KEVESTRDSDVNAGTVGTPTIPRIKSI ~~W~UCAACACCC~WW~UGU~U~CUWU~

8721

KRKGVLNLARLLEYKPS 8 -UGWCUCAACWGGCuCAwuACUuGAAUACAAACXAAGC ACUGAAAAGAQGCGUAUGCCUAAA

8801

v D I ’ N AGwGAcAuAwGAAcAc

8881

EEAMGTVMNGLYVWCIENGTSPNISG %ACAGCACGCAADCCCUACACDC;A(IGAADGCCWAAOCGVO

8961

TWTYYD ACAUGGACC?&UGAUGGAU

90 4 1

~~*~~~~~U~UGVAUEUYUAIE#~~NKQUPY~~P~RU

9121

YGLVRNLRDMGLARYAFDFYEVTSRTS AWGWWGWCGMAWUACGAGACA

9201

TRAREAHI ACACWGCUCGCGAAGCCCAUAUC &AA !LALtAAALKSA GcAGcAwGAAAucwcu

N

S

G

CAAWAWAWUCUCAUCUCGGACGAWAAUA

8160 8240

GCAGCAAUGAWGAAUCAUG

GAAGAAcccAuAcGcuGAcwGGcArJ

8320 8400 8480

ciLx&&u:

8560 8640

GAAAaAuAAAGAAGwGccAcuwwctxAwGAAwGGAAAGA uucAccAwccAAGAAuc.AAAucAAuc

TEKMRMPK

8720 cl

TuC~U~~~~~~~DWWYWCuEVY~AYDULA=

8800 8880

CCCAAACAWAGWGA

VEIPLKPVIENAKPTFR wGGAAwcccAwAAAGcc

cGuGAuAGAGAAu-GccGAcww~

8960 9040 9120

UCCCDCUCCCUCCAOACGCAUUDCACDDCDAUCAACUCAC CL

9200

TRLFGLDG cAAGGwAwuGGAwGGAuGG

GENTERBTTEDVSPDMBTLLG CCAC;AAAACACAGACCGCCACACCACUGAAGAUGUCAGC

9280 9360

9361

VREY* WAGGGAMUWGACUGAUGwGuCUCWGGAwAAAUAwAWAUAUGUAGUAwCAAUAUAUAGuAUGGCWWCUCG

9440

9441

WCCAGUCWUAuAWAAwAGAGuAACWAAWAAGuAAWuGuACUuCAAGGAwAAUCAAGGu

9520

9521

CUCAWGAGGUGACWGUWAGUCUGAGWuACwAWWGAWAUAAAGAAUCUCWAGAAAACGAGAG

9601

ACACACUC-

GAuCWAGWC(n)

GAcucucuGAcAcu wAcwcuAG

9600

9640 FIG. 1 -Continued

RESULTS AND DISCUSSION Primary structure of the PepMoV C sequence The assembled PepMoV C sequence is 9640 nucleotides not including the poly(A) tail (Fig. 1). Computer analyses identified a single long open reading frame (ORF) beginning at position 168 and ending with an UGA termination codon at position 9372 of the virion (+) sense strand, followed by a 3’ noncoding region of 265 bases. The initiation codon at 168 is the first AUG in the sequence and is in the context UAAAAAGC, which is in reasonable agreement with the consensus sequence for plants (AACA,$GC) (Lutcke et a/., 1987). No other ORFs longer than 220 nucleotides were identified in the other two (+) strand

reading frames or in any of the (-) strand reading frames. The overall base composition of the sequence is31.7%A, 18.1%C,23.1%G, 27.096l-J. 3 and 5 untranslated regions. The 5’ leader sequence of the PepMoV C sequence is 167 nucleotides and has a significantly different base composition than that of the overall sequence (36.9% A, 19.0% C, 8.9% G, and 35.19/o U). The low content of G in the 5’leader is typical of potyviruses and other plant viruses (Gallie et a/., 1987). Both 3’ and 5’ untranslated regions of PepMoV C are homologous to the analogous regions of TEV, TVMV, PVY, and PPV, with levels of homology ranging from 33-450/o. In contrast, the levels of nucleotide homology within the coding regions of the same viruses are much higher, ranging from 58-659/o. De-

PepMoV

GENOME

tailed analyses of sequence similarities of both 5’and 3’ untranslated regions with those of related viruses have been published (Turpen, 1989; Vance era/,, 1992). Assignment

of proteolytic

cleavage

sites

Mature potyviral proteins are expressed by proteolytic processing of the precursor polyprotein. These processing events are carried out by at least three known virally encoded proteases, the Nla protease, the helper component-protease (HC-pro), and a protease activity encoded in the 5’terminal cistron (pro-l). Putative Nla protease cleavage sites. The Nla protease of TEV has been shown to cleave at five locations along the TEV polyprotein and each cleavage occurs at a conserved sequence of amino acids which constitutes the TEV Nla protease recognition signal (Dougherty et al., 1988). Similarly conserved sequences have been noted at the identified and proposed Nla protease cleavage sites of the other three sequenced potyviruses, with the exact consensus cleavage sequence somewhat variable among the viruses (Robaglia et al., 1989). Five PepMoV Nla protease cleavage sites have been identified by homology with those of the other four sequenced potyviruses. Each of the proposed PepMoV C Nla protease cleavage sites occurs between glutamine (Q) and either serine (S), glycine (G), or alanine (A), as demonstrated for nearly all potyviral Nla protease cleavages to date. The conserved amino acid sequence around each of the five proposed PepMoV Nla protease cleavage sites, as well as the deduced consensus cleavage site sequence for the PepMoV Nla protease is shown in Fig. 2 (sections D and E). The consensus cleavage site of the PepMoV Nla protease is identical to that of PVY (Fig. 2E; Robaglia et a/., 1989) suggesting a close relationship between these two viral proteases. Putative HC-pro cleavage site. A sixth potyviral cleavage is proposed to occur by action of the potyviral HC-pro protease, a member of the cysteine-type family of proteases (Oh and Carrington, 1989). This protease has recently been shown to cleave between glytine residues and the cleavage site has been mapped to the carboxy-terminus of the HC-pro (Carrington et al., 1989a,b). Two other amino acids located aminoterminal to the diglycine, a tyrosine (Y) at the P4 position and a valine (V) at the P2 position, are required for the cleavage (Carrington and Herndon, 1992) (Fig. 2C). Alignment of the potyviral polyproteins demonstrates that this region is highly conserved in all five sequenced potyviruses (Fig. 2C). The proposed cleavage in the PepMoV C polyprotein would occur at the diglytine sequence at amino acid position 743-744. Putative cleavage site at the amino-terminus of HCpro. A seventh cleavage event defining the amino-ter-

SEQUENCE

25

minus of the HC-pro has been shown to occur via a protease activity encoded in the 5’ terminal cistron of the TEV genomic RNA(Verchot eta/., 1991). The cleavage site has been mapped in the TVMV polyprotein to occur between a phenylalanine (F) at position 256 and a serine (S) at 257 (Mavankal and Rhoads, 1991). The homologous region of the PepMoV polyprotein at position 287-288 (Fig. 2B), has the sequence tyrosine (Y)/ S;which is consistent with the proposed consensus sequence F(Y)/S.

Comparison of PepMoV C proteins other potyviral proteins

with those of

Computer-assisted comparisons indicate that the deduced PepMoV C polyprotein shares homology of at least 65% with those of each of the four sequenced potyviruses. Furthermore, a number of motifs are conserved between the PepMoV C polyprotein sequence and those of the four other sequenced potyviruses and these are discussed below. A proposed map of mature PepMoV C proteins along the deduced polyprotein based on the comparative sequence analyses described below is shown in Fig. 2A. The locations of proposed cleavage sites and conserved motifs are displayed. Nuclear inclusion body protein b (Nib). The most conserved of the individual potyviral proteins is the putative core polymerase (Nib protein) which displays >70% similarity among the five sequenced potyviruses. The PepMoV C Nlb protein is 82.8% similar to the analogous protein in PVY and somewhat less similar to PPV, TVMV, and TEV with percentage similarities of 72.8, 72.3, and 71.2, respectively. The Nlb protein of potyviruses contains a version of the sequence consensus motif [S(T)GXXXTXXXNS(T) (18 to 37AA) GDD] which is conserved in a variety of both animal and plant (+) stranded RNA viral RNA-dependent RNA polymerases (Kamer and Argos, 1984). This polymerase motif is present in the PepMoV C-deduced Nlb protein in a position analogous to that of other sequenced potyviruses when the polyproteins are aligned based on sequence homology (position 2586-2629 of the PepMoV polyprotein sequence). Coat protein. The alignment of the coat protein sequence of PepMoV C with that of several other potyviruses including two strains of PVY has been reported (Vance et al., 1992). The potyviral coat protein sequences are highly conserved throughout most of the sequence but diverge in sequence and length at the amino-terminus. The coat protein sequence of our confirmed PepMoV isolate is approximately 80% similar to those of both strains of PVY and ranged from 71

VANCE

26

A.

ET AL.

Nla protease metal binding site active site glycosylation site WI HC-pro active site polymerase motif NTP binding motif I II I

II

I

I

I

Pro-l HC-pro

Pro-3

Cl /

I

I

Nla

Nlb

CP

\

B. PRO-l/HC-PRO PepMoV PVY TVMV PPV TEV

CLEAVAGE SITES J.

Cleavaqe

S(285)

(255)HF S(257) (307)HY S(309) (303)HY S(305)

Pro3/CI CI/CK

6K/NIa NIa/NIb

NIb/CP

"6Kl"/CI NIa internal cleavage

C.

(1151)s G V R (1785)Q F V H (1837)s E V S (2271)E S V R (2790)Y E V R (1099)K (2025)E

Sequence Pl SP'l S L D(1159) Ii Q 9 Q s K S(1793) 6 Q G R S(1845) E Q A H T(2279) E Q S S R(2798)

QV V B Q V V K il E

R S T(1107) A K T(2033)

E.

HC-PRO/PRO-3 PepMoV PVY TVMV PPV TEV

PROTEASE CLEAVAGE SITES

Junction

(28639~ S(288) (283)QF

PepMoV NIa

(739)H (736)H (709)Q (762)T (759)T

CLEAVAGE SITES P4 Y R Y R Y K Y L Y N

P2 .t V G G V G G V G G V G G V G G

I . L L M

V(746) V(742) V(716) E(769) N(766)

NIa

CONSENSUS CLEAVAGE SITES PepMoV

V-"Q/t E G

PVY

A V-HQ/S E G

PPV

V-HQ,: T G

TVMV

VRFQ/S KT G

TEV

E--Y-Q/S G

FIG. 2. Proposed map of the PepMoV polyprotein showing the location of putative cleavage sites and the individual viral polypeptides they demarcate. Specific motifs referred to in the text and their locations along the proposed PepMoV polyprotein map are indicated in A. A comparison of the amino acid sequence of the proposed pro-l protease cleavage sites of PepMoV C, PVY, TVMV, PPV, and TEV is shown in B. A similar comparison of proposed cleavage sites for the HC-pro protease is shown in C. (D) The deduced amino acid sequence around the five proposed PepMoV Nla protease cleavage sites. The sequences of two degenerate consensus Nla protease cleavage sites which may delineate the viral 6K, and VPg proteins are also shown in D. A consensus PepMoV Nla protease cleavage site based on conservation of amino acid sequence among the five sites is shown in E along with the Nla protease consensus cleavage sites of the four other sequenced potyviruses. The locations of the amino acid sequences within the various viral polyproteins in B-E are indicated in parentheses. In each case, the site of cleavage is indicated by an arrow or a diagonal line. Amino acids conserved in all five viral sequences are displayed in bold letters.

to 76% similar to those of TEV, PPV, and TVMV. Previous work has demonstrated that strains of the same potyviral species display high levels of coat protein sequence homology (greater than 90% similar) and that this homology level is significantly reduced among distinct viral species and ranges from 40 to 80% similarity (Shukla and Ward, 1988, 1989). Based on the level of

coat protein homology, PepMoV C is a viral species distinct from, but closely related to PVY. Cytoplasmic inclusion body protein (Cl). The putative Cl protein sequence of PepMoV C is highly conserved between PepMoV C and PVY (-80% similarity) and significantly less conserved in comparison with the analogous protein sequences of PPV, TEV, and TVMV

PepMoV GENOME SEQUENCE

(6 l-69% similarity). A nucleotide-binding motif (GAVGSGKST) located near the amino-terminus of the putative Cl protein has been previously correlated with helicase-like proteins (Hodgman, 1988). This motif, located at position 1241-l 249 of the PepMoV C polyprotein, is strictly conserved in the analogous positions of all the sequenced potyviruses. Nuclear inclusion body protein a (Nla). The potyviral Nla protein is thought to be multifunctional with an amino-terminal domain serving as the viral VPg (Dougherty and Parks, 199 1; Murphy et al,, 1990; Shahabuddin et a/., 1988) and the carboxy-terminal domain functioning as a protease. An internal cleavage site separating these two Nla protein domains has been proposed in the TEV polyprotein between a glutamic acid residue at position 2037 and a glycine residue at position 2038 (Dougherty and Parks, 1991). This region of the TEV polyprotein has some features characteristic of the Nla protease consensus cleavage site, and the internal cleavage appears to be mediated by the activity of the TEV Nla protease (Dougherty and Parks, 1991). The analogous region of the PepMoV C polyprotein is a typical Nla protease cleavage site except for the substitution of a glutamic acid (E) for the consensus Q in the Pl position (Fig. 2D). The level of sequence homology of the PepMoV Nla protein with those of other potyviruses (80% similarity with PVY and 6 l-69% similarity with TEV, TVMV, and PPV) and the presence of the putative internal cleavage site, suggest that the PepMoV Nla protein is organized into VPg and protease domains in a manner similar to that deduced for TEV and TVMV (Dougherty and Parks, 199 1; Murphy et al., 1990). An amino acid triad (H, D, C) located near the carboxy-terminus of the Nla protein has been shown to constitute the active site residues of the Nla protease domain in TEV (Dougherty et a/., 1989). Alignment of the potyviral polyprotein sequences shows that the location and spacing of these three amino acids is strictly conserved in all the sequenced potyviral Nla proteases including that of PepMoV C. Thus, the amino acid triad is likely to serve the same function in these related proteases. The putative localization of the PepMoV Nla protease on the deduced polyprotein was analyzed by a functional assay. In this analysis, a nested set of deletion cDNA clones encoding various 3’ subsets of the PepMoV genome were expressed in bacteria. The presence of PepMoV Nla protease coding region on each particular cDNA was assayed by ability of the encoded polyprotein to accurately cleave the PepMoV coat protein sequence encoded at the 3’ end of each expressed cDNA. Accurate cleavage by the Nla protease was assayed by the appearance of a protein

27

A.

VI234567

B.

T PepMoV

IP-1 1 HC 1 P-3 1 Cl

Deletions I

11 Nla 1 Nib

lCq

2 3

4 5 67FIG. 3. Map of deletion clones of PepMoV genomic RNA and Western blot of deleted proteins expressed in bacteria. The seven cDNA deletion clones (l-7) and their position alongthe PepMoV genomic RNA sequence are shown in B. The coding regions referred to as pro-l and pro-3 in the text are labeled P-l and P-3, respectively, in the polyprotein map. (A)Western blot of proteins isolated from bacteria expressing the various deletion cDNAs and reacted with antiserum specific for the PepMoV coat protein. Lanes l-7 show proteins from bacteria expressing cDNA clones l-7, respectively. The position of PepMoV coat protein from purified virions is shown in the lane marked V.

product which comigrated with authentic PepMoV coat protein and was immunoreactive with polyclonal antiserum specific for PepMoV coat protein. Figure 3B shows the structure of each deletion cDNA clone and the localization of the PepMoV gene products as deduced by homology with other sequenced potyviral polyproteins. Figure 3A shows a Western blot of proteins derived from expression of each of the seven deletion clones in E. co/i and probed with antiserum specific to PepMoV coat protein sequences. Deletion clones l-4 all show the expression of mature-sized PepMoV coat protein (Fig. 3A, lanes l-4, shows proteins from deletion clones l-4, respectively). However, clones 5 and 6, in which most or all of the sequence homologous to potyviral Nla proteases is deleted, express coat protein sequences as part of a polypeptide larger than mature coat protein (Fig. 3A, lanes 5 and 6, respectively). Deletion clone 7, which is deleted into the coat protein sequence, expresses coat protein sequences as a polypeptide smaller than mature coat protein (Fig. 3A, lane 7). Thus the ability of expressed cDNAs to produce mature-sized PepMoV coat protein

VANCE

28

-1.0

ET AL.

1

-

--I

I

O

I

I

I

1

I

I

I

I

1,000

(

2,000

I

I

I

I

1

I-

3,000

Position

FIG. 4. Similarity plot displaying the degree of similarity at different positions along an alignment of the putative polyproteins of PepMoV, PVY, TVMV, TEV, and PPV. Viral polyproteins were aligned using the progressive alignment method of Feng and Doolittle (1987) and the average similarity among members of the alignment was calculated at each position using a sliding 1O-amino-acid window for the comparison. The average similarity across the entire alignment is shown as a dotted line and similarity at each position is plotted along the line. A perfect symbol match has a value of 1.5 and non-matches, depending on evolutionary distance, have values less than 1.5. The position of putative PepMoV proteins along the alignment is shown below the plot. Coding regions referred to as pro-l and pro-3 in the text are labeled P-l and P-3, respectively, in the diagram.

is correlated with presence of Nla protease sequences as deduced by comparative sequence homology. 6K proteins. A small polypeptide termed “6K protein” is located between the Cl body protein and the Nla protease on the potyviral polyprotein. This protein in PepMoV is 7 1.1% similar to the analogous protein in PVY and 59-63% similar to those in TEV, TVMV, and PPV. The amino-terminal cleavage site defining this protein in the sequenced PPV isolate is unusual in that it contains tyrosine in the P’l position (+l relative to the peptide bond cleavage) instead of the consensus S, A, or G (Maiss et al., 1989). However, recent evidence suggests that the PPV 6K site is processed (Garcia et a/., 1990). The 6K protein was once thought to encode the potyviral VPg, but a variety of recent evidence suggests that this is not the case (Murphy et a/., 1990; Shahabuddin et al., 1988) and the function of the 6K protein is currently unknown. A second small protein termed “6K,“, located to the amino-terminus of the Cl protein, has been proposed for PPV based on the presence of a consensus Nla protease cleavage site (WHO/S) at position 11 131 117 of the PPV polyprotein (Garcia eta/., 1989). Alignment of the deduced polyproteins of PepMoV, PVY, TEV, and TVMV with that of PPV indicates that the analogous regions of these viral polyproteins contain degenerate Nla protease cleavage sites, each differing at one amino acid from the consensus cleavage site of the viral Nla protease. In PepMoV, the analogous re-

gion contains the sequence VVHQR, with an arginine (R) in the putative P’l position instead of the consensus A, S, or G (Fig. 2D). It is not known if this potential Nla protease cleavage site is used in any of the viral isolates.

Conserved motifs in the amino-terminal portion of the polyprotein. The sequences at the amino-terminal end of the polyprotein are much less conserved among the sequenced potyviruses than are those at the carboxy-terminus. However, at least three motifs in this region are conserved in TEV, TVMV, PPV, and PVY polyproteins and this conservation may now be extended to PepMoV C. The first motif is a “zinc finger”like metal binding motif [C-8AA-C-13AA-C-4AA-C2AA-C] which was first noted in the PVY sequence (Robaglia eta/., 1989) and is strictly conserved in all five sequenced potyviruses. The second conserved motif is the [C-72AA-H] which is present within the putative HC-pro. This motif is required for protease activity of the HC-pro in TEV (Oh and Carrington, 1989) and is tightly conserved among all the sequenced potyviruses and thus may serve the same function in these closely related viruses. The HC-pro of some potyviruses is known to be glycosylated (Berger and Pirone, 1986) and a number of possible glycosylation motifs have been identified from the deduced amino acid sequences of the putative HC-pro region of the genome. One of these glycosylation motifs, at position 453-455 of the PepMoV polyprotein, is conserved in the analo-

PepMoV GENOME SEQUENCE gous regions of all the sequenced potyviruses and therefore may be important for the function of this viral protein. Overall homology

among potyviral

polyproteins

Comparison of the complete amino acid sequences of PepMoV, PVY, TVMV, TEV, and PPV was used to determine the overall homology levels among the five sequenced potyviruses and computer-assisted alignment of the sequences was used to plot the similarity of different regions along the polyprotein (Fig. 4). The percentage of overall similarity is highest between PepMoV C and PVY polyproteins at 76.3% and significantly lower between PepMoV C and PPV, TVMV, and TEVat 67.5,65.1, and 65.2%, respectively. Thesimilarity plot analysis indicates that the homology level among the five viruses varies along the polyprotein. In general, the carboxy-terminal 2/3 of the polyprotein, starting at the Cl protein and continuing through to the coat protein, is highly conserved among all the potyviruses. The putative HC-pro region is also relatively highly conserved. In contrast, the putative pro-l and pro-3 proteins and the amino-termini of the coat proteins are much less conserved. These diverged regions of the potyviral polyprotein may function in processes which differentiate the potyviral species, such as host range determination. ACKNOWLEDGMENTS The authors gratefully acknowledge Dr. Lewis Bowman for critical reading of the manuscript. This work was supported in part by Sandoz Crop Protection Corp. and in part by Grant DBM-8817233 from the National Science Foundation.

REFERENCES AEDALLA, 0. A., DESJARDINS,P. R., and DODD% J. A. (199 1). Identification, disease incidence, and distribution of viruses infecting peppers in California. Plant Dis. 75, 101 g-1023. ALLISON, R.. JOHNSTON, R. E., and DOUGHERN, W. G. (1986). The nucleotide sequence of the coding region of tobacco etch virus genomic RNA: Evidence for the synthesis of a single polyprotein. Virology 154, 9-20. BERGER, P. H., and PIRONE, T. P. (1986). Evidence that potyvirus helper component is a glycoprotein. Phyropafhology 76, 1063. BRADFORD, M. (1976). A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 72, 248-252. BRUENING,G., BEACHY, R., SCALLA, R.. and ZAITLIN. M. (1976). /n vivo and in vitro translation of the ribonucleic acids of a cowpea strain of tobacco mosaic virus. Virology 71, 498-517. CARRINGTON.J. C., CARY, S. M., PARKS,T. D., and DOUGHERTY,W. G. (1989a). A second proteinase encoded by a plant potyvirus genome. EMBO J. 8,365-370. CARRINGTON,J. C., FREED, D. D., and SANDERS, T. S. (1989b). Autoproteolytic processing of potyvirus proteinase HC-Pro in Escherichia co/i and in vitro. J. Viral. 63, 4459-4463.

29

CARRINGTON,1. C., and HERNDON, K. L. (1992). Characterization of the potyvirai HC-Pro autoproteoiytic cleavage site. Virology 167, 308315. DEMEJIA, M. V. G., HIEBERT, E., PIJRCIFULL,D. E., THORNBURY,D. W., and PIRONE,T. P. (1985). Identification of potyviral amorphous inclusion protein as a nonstructural, virus-specific protein related to helper component. Virology 142, 34-43. DOMIER. L. L., FRANKLIN, K. M., SHAHABUDDIN, M., HELLMANN, G. M., OVERMEYER,J. H., HIREMATH, S. T., SIAW, M. F. E., LOMONOSSOFF, G. P., SHAW, J. G., and RHOADS, R. E. (1986). The nucleotide sequence of tobacco vein mottling virus RNA. Nucleic Acids Res. 14, 5417-5430. DOUGHERTY,W. G., and CARRINGTON, 1. C. (1988). Expression and function of the potyviral gene products. Annu. Rev. Phyropathol. 26, 123-l 43. DOUGHERTY,W. G., CARRINGTON,J. C., CARY, S. M., and PARKS,T. D. (1988). Biochemical and mutational analysis of a plant virus polyprotein cleavage site. EMBO J. 7, 1281-l 287. DOUGHERIY, W. G., and PARKS,T. D. (1991). Post-translational processing of the tobacco etch virus 49-kDa small nuclear inclusion polyprotein: Identification of an internal cleavage site and delimitation of VPg and proteinase domains. Virology 183, 449-456. DOUGHERTY,W. G., PARKS,T. D., CARY, S. M., BAZ~N, J. F., and FLETTERICK, R. J. (1989). Characterization of the catalytic residues of the tobacco etch virus 49-KDa proteinase. Virology 172, 302310. FENG, D., and DOOLITTLE, R. F. (1987). Progressive sequence alignment as a prerequisite to current phylogenetic trees. J. Mol. Evol. 25, 351-360. GALLIE, D. R., SLEAT, D. E., WA?-% 1. W., TURNER, P. C., and WILSON, T. M. A. (1987). A comparison of eukaryotic viral 5’-leader sequences as enhancers of mRNA expression in vivo. Nucleic Acids Res. 15, 8693-8711. GARCIA, J. A., LAIN, S., CERVERA,M. T., RIECHMANN,J. L., and MARTIN. M. T. (1990). Mutational analysis of plum pox potyvirus polyprotein processing by the Nla proteinase in Escherichia co/i. 1. Gen. l&o/. 71) 2773-2779. GARCIA,J. A., RIECHMANN,1. L., and LAIN, S. (1989). Artificial cleavage site recognized by the plum pox potyvirus in Escherichia co/i. J. Viral. 63, 2457-2460. HODGMAN, T. C. (1988). A conserved NTP-motif in putative helicases. Nature (London) 333, 22-23. L~TCKE, H. A.. CHOW, K. C., MICKEL, F. S., Moss, K. A., KERN, H. F., and SCHEELE, G. A. (1987). Selection for AUG initiation codons differs in plants and animals. EMBO J. 6, 43-48. KAMER, G., and ARGO% P. (1984). Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Res. 12, 7269-7282. MAISS, E., TIMPE, U., BRISSKE,A., JELKMANN.W., CASPER.R., HIMMLER, G., MAITANOVICH, D., and KATINGER, H. W. D. (1989). The complete nucleotide sequence of plum poxvirus RNA. J. Gen. viral. 70, 513-524. MAVANKAL, G., and RHOADS, R. E. (1991). /n vitro cleavage near the N-terminus of the helper component protein in the tobacco vein mottling virus polyprotein. Virology 185, 72 l-73 1. MURPHY, J. F., RHOADS, R. E., HUNT, A. G., and SHAW, J. G. (1990). The VPg of tobacco etch virus RNA is the 49-kDa proteinase or the N-terminal 24-kDa part of the proteinase. Virology 178, 285-288. NELSON, M. R., and WHEELER, R. E. (1972). A new virus disease of pepper in Arizona. Plant Dis. Rep. 56, 731-735. NELSON, M. R., and WHEELER,R. E. (1978). Biological and serological characterization and separation of potyviruses that infect peppers. Phytopathology 68, 979-984. OH, C-S., and CARRINGTON,1. C. (1989). Identification of the essential

30

VANCE

residues in potyvirus proteinase HC-Pro by site-directed mutagenesis. Virology 173, 692-699. PURCIFULL,D. E., HIEBERT, E., and MCDONALD, J. G. (1973). Immunochemical specificity of cytoplasmic inclusions induced by viruses in the potato Y group. Virology 55, 275-279. PURCIFULL,D. E., ZITTER, T. A., and HIEBERT, E. (1975). Morphology, host range and serological relationships of pepper mottle virus.

Phytopathology 65, 559-562. ROBAGLIA, C., DURAND-TARDIF, M., TRONCHET, M., BOUDAZIN, G., AsTIER-MANIFACIER,S., and CASSE-DELBART,F. (1989). Nucleotide sequence of potato virus Y (N strain) genomic RNA. J. Gen. Viral. 70,

935-947. SHAHABUDDIN. M., SHAW, 1. G., and RHOADS, R. E. (1988). Mapping of the tobacco vein mottling virus VPg cistron. virology 163, 635637. SHUKIA, D. D., and WARD, C. W. (1988). Amino acid sequence homology of coat proteins as a basis for identification and classification of the potyvirus group. 1. Gen. Virol. 69, 2703-2710.

ET AL. SHUKLA. D. D., and WARD, C. W. (1989). Identification and classification of potyviruses on the basis of coat protein sequence data and serology. Arch. Virol. 106, 17 l-200. TURPEN, T. (1989). Molecular cloning of a potato virus Y genome: Nucleotide sequence homology in non-coding regions of potyviruses. J. Gen. Viral. 70, 1951-l 960. VANCE, V. B. (1991). Replication of potato virus X RNA is altered in coinfections with potato virus Y. Virology 182, 486-494. VANCE, V. B., JORDAN, R., EDWARDSON,J. R.. CHRISTIE, R., PURCIFULL, D. E., TURPEN,T., and FALK, B. (1992). Evidence that pepper mottle virus and potato virus Y are distinct viruses: Analyses of the coat protein and 3’ untranslated sequence of a California isolate of pepper mottle virus. Arch. Viral. [supplement 51, 337-345. VERCHOT,J., KOONIN, E. V., and CARRINGTON,J. C. (1991). The 35-kDa protein from the N-terminus of the potyviral polyprotein functions as a third virus-encoded proteinase. virology 185, 527-535. ZIITER, T. A. (1972). Naturally occurring pepper virus strains in Florida. Plant Dis. Rep. 56, 586-590.