VIROLOGY
191,
1 g-30
(19%)
The Complete Nucleotide Sequence of Pepper Mottle Virus Genomic RNA: Comparison the Encoded Polyprotein with Those of Other Sequenced Potyviruses
of
VICKI BOWMAN VANCE,*,’ DELORES MOORE,* THOMAS H. TURPEN,t ALlAN BRACKER,+ AND VICTORIA C. HOLLOWELL* ‘Department of Biological Sciences. University Vacaville, California 95688; and *Lawrence
of South Carolina. Columbia, South Carolina 29208; tBiosource Genetics Corporation, Berkeley Laboratory, University of California at Berkeley, Berkeley, California 94720
Received May 7, 1992; accepted July 6, 1992 The complete nucleotide sequence of a pepper mottle virus isolate from California (PepMoV C) has been determined from cloned viral cDNAs. The PepMoV C genomic RNA is 9640 nucleotides excluding the poly(A) tail and contains a long open reading frame starting at nucleotide 168 and potentially encoding a polyprotein of 3068 amino acids. Comparison of the PepMoV C presumptive polyprotein with those of other sequenced members of the potyvirus group, including tobacco etch virus (TEV), tobacco vein mottling virus (TVMV), plum pox virus (PPV), and potato virus Y (PVY), allowed localization of putative protein cleavage sites. A similar analysis was used to determine the position of conserved viral protein-coding regions along the viral genomic RNA. These analyses confirm previous work indicating that genome organization is conserved among members of the genus Potyvirus. The localization of one PepMoV C gene product, the nuclear inclusion body protein a (Nla protein), was analyzed by expressing PepMoV cDNA deletion clones in bacteria and assaying for appearance of mature-sized coat protein, a cleavage product of the Nla protease. Comparative sequence analyses of the putative PepMoV polyprotein with those of TEV, TVMV, PPV, and PVY served to identify regions of the potyviral polyproteins which have diverged within the genus, as well as highly conserved protein features which may play an important functional role in the potyviral life cycle. 0 1992 Academic Press. IW.
ing of potyviral genome structure, as well as suggested functions for several mature viral proteins. To date, the complete nucleotide sequences of four potyviruses, including tobacco etch virus (TEV), tobacco vein mottling virus (NMV), plum pox virus (PPV), and the N strain of PVY, have been reported (Allison et a/., 1986; Domier et al., 1986; Maiss et al., 1989; and Robaglia et a/., 1989, respectively). For each of these potyviral species, the sequence studies reveal a single long open reading frame encoding a large polyprotein punctuated at conserved locations with protein cleavage signals which are similar but not identical among the various potyviral species. These sequence data are consistent with the previously deduced finding that mature potyviral proteins are expressed via proteolytic processing of a virally encoded polyprotein (for review see Dougherty and Carrington, 1988). Sequence similarity among the viral polyproteins indicates a common organization of the eight known potyviral genes along the viral genomic RNA. Furthermore, a number of protein motifs are found to be conserved among the sequenced viruses, suggesting that these sequences serve an important function shared by each of the potyviral isolates. Here we report the complete nucleotide sequence of the genomic RNA of PepMoV C. The PepMoV genomic RNA, like those of the four other sequenced potyviruses, contains a long open reading frame that poten-
INTRODUCTION Pepper mottle virus (PepMoV), a member of the genus Potyvirus, was originally described in 1972 as an atypical isolate of potato virus Y (PVY, Nelson and Wheeler, 1972; Zitter, 1972) and later reclassified as a distinct virus species based on classical studies of host range, cytopathology of infection, and serology of coat and cytoplasmic inclusion body proteins (Abdalla eta/., 1991; deMejia et al,, 1985; Nelson and Wheeler, 1978; Purcifull et a/., 1973, 1975). We have recently reported the deduced amino acid sequence of the coat protein of a potyviral isolate from California which was definitively identified as PepMoV (PepMoV C) by a variety of classical methods (Vance eta/., 1992). Comparative sequence analyses of the 3’ untranslated region of the genomic RNA and of the deduced coat protein amino acid sequence of this strain of PepMoV with those of PVY and other related potyviruses confirm the earlier serological and biological evidence that PepMoV and PVY are distinct viruses. Analyses of complete genomic RNA sequence for individual potyviruses has advanced our understand-
The nucleotide sequence data reported in this paper have been submitted to the GenBank nucleotide sequence database and have been assigned the Accession Number M96425. ’ To whom reprint requests should be addressed. 19
0042.6822/92
$5.00
Copyright 0 1992 by Academic Press. Inc. All rlghis of reproductton I” any form reserved.
VANCE
20
tially encodes a polyprotein. Analyses of the deduced amino acid sequence of the PepMoV-encoded polyprotein and comparison with those of other sequenced potyviruses allowed us to assign proteolytic cleavage sites and locate genes on the PepMoV genomic RNA. PepMoV genome organization is similar to those of other potyviruses and overall homology of the PepMoV polyprotein to those of PVY, TEV, TVMV, and PPV is relatively high. However, several regions within the encoded polyproteins have diverged significantly within the genus Poryvirus and may reflect basic functional differences between these individual viral species.
METHODS PepMoV C isolate The isolate of PepMoV used in this work was originally collected in 1974 from field-grown pepper in California and identified at that time as PVY. The original collection was by Dr. A. 0. Paulus, Department of Plant Pathology, University of California, Riverside, California, and the isolate was subsequently propagated by Dr. L. G. Weathers, (University of California, Riverside) by single aphid transmission. The RNA genome of this putative PVY isolate was molecularly cloned and a partial sequence (5’ and 3’ untranslated regions) was published in 1989 (Turpen, 1989). Sequence data reported here was derived from the same cDNA clones reported at that time. A number of procedures were used to definitively determine that this potyviral isolate is not PVY, but a strain of PepMoV and these results are described elsewhere (Vance et al., 1992).
cDNA cloning The preparation of purified virus, viral genomic RNA, cDNA clones, and deletion clones has been described (Turpen, 1989).
Nucleic acid sequencing The majority of the nucleotide sequence of PepMoV C was determined from deletion subclones of two overlapping cDNAs which represent nearly the entire PepMoV C genomic RNA (Turpen, 1989). The 5’end of the genomic RNA is represented by clone pZ0177, which is 1.4 kb and contains all but 17 nucleotides from the 5’ end (Turpen, 1989). The second cDNA, pZO171, is 8.5 kb and overlaps pZO177 by several hundred nucleotides. This cDNA does not contain a poly(A) tract and lacks three terminal nucleotides (Turpen, 1989). Some sequence was obtained from a 5.9kb cDNA (clone pZO174) which overlaps the 3’ end of pZ0 17 1, but includes a poly(A) tract of 11 nucleotides. Six independent clones containing the 3’ end of the
ET AL.
PepMoV genome were sequenced and contained poly(A) tracts of 1, 8, 8, 1 1, and 52 nucleotides (Turpen, 1989). The original cDNAs are representative of the virion RNA as determined by comparing restriction maps of viral cDNA before and after cloning (Turpen, 1989). The nucleotide sequence of the deletion subclones was determined by dideoxynucleotide chain termination reactions on denatured double-stranded templates using cloned T, DNA polymerase exactly as described by the manufacturer (U.S. Biochemical Cleveland, Ohio). Internal sequencing primers were synthesized and used as needed to complete sequence determination for both strands of the cDNA along their entire length. A portion of the nucleotide sequence (approximately 3000 nucleotides) was obtained commercially from Lark Sequencing Technologies, Inc. (Houston, TX). The 5’terminal 17 nucleotides were not represented on any cDNA clone and were determined by sequencing directly from the viral RNA (VPg-linked) template (Turpen, 1989). In some cases, the VPg-linked template was pretreated with proteinase K before sequencing. The nucleotide sequence of the 5’ and 3’ untranslated regions of the PepMoV genome have been reported (Turpen, 1989).
Western blot analyses Protein extracts from bacteria expressing PepMoV cDNA clones were analyzed for the presence of PepMoV coat protein sequences by Western blot analyses. Bacteria were pelleted, lysed in a solution of 2% sodium dodecyl sulfate (SDS), 5% P-mercaptoethanol, 10% glycerin, and 62.5 mM Tris-phosphate, pH 6.7. Samples from the various deletion subclones containing equal amounts of protein as assayed by Bradford analyses (Bradford, 1976) were separated by SDS-polyacrylamide gel electrophoresis (Bruening et al., 1976) and the gel electroblotted and the protein blot processed exactly as previously described (Vance, 1991) using polyclonal rabbit antiserum specific for PepMoV coat protein as the primary antibody.
Computer analyses All the computer analyses were performed using the Sequence Analysis Software Package (GCG package, version 7) by Genetics Computer, Inc. (Madison, WI). The alignment of potyviral polyprotein amino acid sequences was created using the program “Pileup” which uses a simplification of the progressive alignment method of Feng and Doolittle (1987). The program “Plot Similarity” was used to calculate the average similarity among members of the alignment specified by “Pileup” at each position in the alignment.
PepMoV
1 81 161 241
AAAvmAAAcAvAAcAOAcAAcAV AVrnWAAGcA
II
c
0mcvGcccAAAA
SEQUENCE
AAAwaAAGcAAvcAAAwvvcAAGcA
21
wwvcvvGAGwmGAwwvccvw
80
Cvw~v~~v~vcvc~~~~~~w~~wwcvwM~~vc~
MATSVI VACO~VGGCAACCAGVGWAVV
v
GENOME
P
x
c2G
160
PGSTVCNLPKS wvGGvvcAvwGvGoGcAAvcwccAAAGvc
OAVGCACAACVG
SWSTNIVRPSDPFAELEKBLE c2 OAGCADOAGCACDV~V~W~V~
-GcAvcwGAA
320
TKGGTLVYRHYSEA c%AA CVAAAGGCGGGACGCWGVGVACAMClKAV-
321
P Y L CCAVACW
401
KRARKLRKK ~-PREEEE’RLIYN~~V~~VG
481
VSNITIGGGEVPSK#EEVSIKRPLNKT vGAGVAAcAvcAcAAv-
561
PSRKIKKSLTPVTFRDGBMNKFLRELR UUAAGAAAOCGCOCACACCAGVGACCWOAGGGAC CCWCVCGGA&GA
641
D C A T R N S W T V GG&WGVGCVACCCGVAACAWAVGXAWACACWGAW-
721
SLNAVYATLBHMRGVDRKRDIVLEEWW WWAAAVGCCGOAVACGCVACACVGCACCAVAVGCWGGAGWG%OCGCAAGCWGAVAWWGCVC
801
NDYVLNLSKVSTWGSLFBAESLKRGDS CAAAAGWAGCACAVGGGGOVCACVAWVCACGCCGAGVCWVAAAGCG&GGVGAVAG AACGAWAVGWCVAAAVCVAO
881
G L I L N A R A L VGGGWGAVACVGAkVGCVAG%CACVGAGAGG
961
240
400 480
GGVGCCCVCVAAAAVGGAGGAAGVGOCCAV
R
H
G
L
I
G
10 4 1 1121
SNCGEVAAII wcAAAcV-
1201
KEICDLTSNECV AAGAGAVCVGCGAVCVCACOVCCAAVGAWWGO
1281
NLHPEF AAVCVACACCCVGAAVW
560
GGAcAcAoGMcAAGwvwGAGAGAAcwAG
K
R
K
K
F G R C VAAAVVCGGACGAVGVA-
SDGVVLDARSKLSMATVTBME CGG?UJ~WOGVACVAGAVGCAAGAVCCAAGCVWCOAVGGCA?bC AFWSGLEKKWSVVRKPTAHTCKPTYSV GCAVWVGG&GOGGCWAGUUAGAAW
-ccGcvGAAvMAAcv
S
;n&$nMnK
R
720
GAAGAAVGGAVG
800
G&&IxVwR
VGVAACVCAVAVGGAA
640
R R A AAAAGGCWGCOA
880
K
G
-v
Y S T P & WAVVCAACACCVGAA
960 E
GGACCGUCGDCCGCAAGCCAACCGCACAUACO[IGOACVVG~~~~VAW~V
1120
LFPCEKLTCGECS WAVWcCWGCCACAAGWGACGVGVGGOGAAVGCVCGA
GVGGCCGCVAWAV G&A
ELYKNTSLALERWN GAWVAOACAAGAA
1200
VACVVCCOOGGCACVGGAAAGGAOGAAC
1280
LTEASNRGT 0cAcvGAAGcAvccAAvcAvGGGAc
1360
THLNKLNE vCcvwcAcvcAwvMAvAAAcvcAAvGMv 14 4 1
FMLKGNENTSGEWLTAR WAVGVV-
1040
1440
LRELVRF wAAGGGAwvGGvGAGAwv
VGAAAAVACAAGO~WGGWGACVGCGCGA
&A YNL VACAAWV
1520
1521
KNRTDNIKKGDLASFRNKLSARA AAGAACAGAACVGAVAAVAO
1601
YLSCDN WAVWAVCAVGCGAOAA
1681
FL NFF vccvGAAcwcvvc
1761
KLAIGNLIVPLDLAEFRKRYNGIDT AAGVVAGCOAVAGGCAACWA?iWGVOCCACVCGAOWAGCVGAAVVC
1841
PPIGKYCTS ACCACCAADDGGVAAGVACVGVACAAGC
1921
G $A~p~~O~~O~~VG~W~~~p~~~~V~~~~O~~~~~D~O~CA~C GV
2000
2001
NLPKGDTEMLYIALDGYCYINIYLAWL AAcvoCCCoAAAGGAGAc.AcAGAAA
2080
2081
VNISEEEAKDFTKKVRDIFMPKLGKW AGV~~~V~WV~W~~VCOV~O~~~W~O~
2161
PTLWDLATTCA C?bACWVGAOGGAVWGGCVAC.AACAOGVGCV
2241
ILVDHNT AVWVGGVGGAVCACAACAC!.A
2321
AATVS AGCVGCAACOGVWCA
2401
IVENHKV WWAGAGAAVCAVAAAGV
2481
IKGIYRPSVYYELLSEEPYLLVFSILS AVUAAAGGG&VCVACAGGCCAA
LDKNASFLWG cc% GCWGACAAGAAVGCVAGUWOCVGVGGGGV IDPSKGYLAYEDRTIPNGSR VAGACCCAVCAAAAGGO
&A
wcvGcvcwGcA
AAAGAAGGGvGAcvuAGcAvcAwcAGAAAvAAGcv
CL
REYAARRF ct GCGRGAAVACCACGCA'XVCGGWW
VAWVGGCAOAVGAAGAVCGGACGAOACCA?iAVGGVVCVCGG CGAAA?&CGCAOGAAVGGCAVCGACAC
LDGNFVYPCCCTTLDD c2A VOGGAVGGGAAWWGVWAVCCGVGCVGCVWACMCGCWGAVGAOG
VGCVAOAC&WGCACWGAVGGCOAWWOACAWAACAWVAVCVGGCAAVGW
1600 1680 1760 1840 1920
2160 &AC
LRIFHPDVHDAELPR WCGAAVAWCCACCCOGAOGVGCAVGAVGCAGAGCOGCCVCGV
2240
TCHVVDSYGSISTGYHILK CWWCAOGVGGVCGAWCAVAVGGAVCAAVVAWACVGGGVAVCACAVVCVGAA
2320
LVLFADDNLESEIKHYRVGG VUAGVWVGOWGCVGACGACAACVOGGAAVCOG7&AVAA?iGCACOAVAGAWOGGVGGAA
2400
&A
G&A
ID N VAGAOAAC &A
PSRCGVSEFHAIRML CCVAGVAGAVGVGGAGOGAGVGAWOCCAVGCCAVACGVAVWOA
GVGVCAVGOAVGAAWACVCVCVGAA~CCAVACVVGCVAGVGWCVCCAVOCWVC
FIG. 1. The nucleotide sequence of PepMoV C genomic RNA and the deduced sequence reading frame beginning at nucleotide 168.
of the putative polyprotein
2480 2560
encoded by a long open
VANCE ET AL.
22 2561 2641
2721 2801 2861 2 9 61
3041 3121 3201 3291 3361 3441
3521 3601 3681 3161 3841 3921 4001 4081
4161 4241
4321 4401 4401
4561 4641 4121
4801 4861 4961 5041
PSI LIAMM&!&o~RAFRLAV ACCCOCAAOAWGAOAGCGA
OAGGGCooooGAAooAGCoGW
2640
IPLIATILTNLAAK&,
2120 WCCACoGAooCCCACOAoWo~~ LLNVTCDGIRVSPAY S A L T &A OCAGCOCOAAC 2800 GGGOOOCGGGUGAGWWGCWAU AhGtA&AOb~O cl2GCoGCoGAAoGoGACOoGoGAo E Y D LLTRYRD &iM&G G E'N GGAGGGWCAAOGAAOAOGA C&GGkW 2880 IJcoACoCACAAGGAuGCGAGAu WKELSSLEKS'R LAWTLEKNY 2960 cooAcAcGAc AA u-wAAGcwGcoGGAAAAAwocGc OGGCWGGACC YYWS SRK RKTm&w~&~mI K SRS S PVASA$ OCAUA~CGCCCWO3040 OAWAOOGGOCCOCAAGAAAGCGAMGA S S L S L K P F M G K V ~co~S$&&A&GoC& Gc&&AA 3120 AOCCAGWOAOCACOGAAACCAoooAOTKNFIDARCLGISTYFVGSLMRKFPSA OWCCOAGOGCG 3200 COAAGAAOWOAWCACOCAAGCOWWGGGCAOWCAACCOAOOOOGOAGGAOCACOCAOGCGCAAA KVLLSSLPVLGALLNITRAANRIIIDN AAAGaAOOCCW~~AOOACWLCOUIJAOOCG[IAOUGCCACCCCU 3280 RISREHAAAL&w;oRKEDTCHELYTAL ~OACWGCCAOGAGWAOACACCGCACOGG OCGCAOWCACGCGAA 3360 ERKLGEKPTWDEYCSYVAKINPAMLKF AGCGCAAC(IOGGGAGAGAAAmCCOGGGACGAWACOGCOCAOAOGOGGCOAAGAOCMOCCOGCAAOGCOGGAAOOC 3440 IKDSYDEK v v I1 &&g&o& K K V E B I I A AOOAAAGACOCAoAVGAoGAAUAcl GGvcGoccAc BWGAACACAOAAOAGC 3520 TVTLAIMLFDSERSDCVFKTLNKFKG AOWWOACACoGGCAAOAAOGCOOOOOGACOCOGAAAGGXOGAOOGOGoAW CAAAACOCOGAAOAAAOWAAGGGOG 3600 VVCSLGSGVRH SLDDFVSTYDEKNFV WWWWOCX oAGGcocAGGAGwAGAcA0 c%wCVwGGAoGAwwwAAGcAcA?L oGGAoGAGAAGAAowcGw 3680 VDFELNDSV LTTEITPESWWD WGGAoo'oCGAAWGAAOGAoAWW ccwacAAcoGAGAocAccow~GcoGGoGGGAo 3760 VARGFTIPRYRTEGRFYEITRATAAK AGVCGCOCGGGGWOCACAAoACCACAVOA-oooAOGGAGooCACAA GAGCAACAGCAGCVAAAG 3840 VASDISISSERDli'LIRGAVGSGKSTGL OCGC~wCAUAaAIICAADCDCADCDGAGCCCCACVUO OGOGGGWCOGGOAAAOCCACOGGAOOA 3920 PRXLSTYGRVLLIEPTRPLAENVFK wACACCACOoG&GCACooACGGuAGAGwwGCoGAoAGAACCGACACGGC CAWAGCAGAA%AOGOWOCAA 4000 G&G& SGGPFFLKPTMRWRGNSVFGSSPISV AOCOGGOGGOCCAOOOW~OAAAAC
CCACAAOGAGAAOGCGOGGOAAOAWWGOOOGGGOCGOCGCCOAOWCOWAA
MTSGFALAFFANNIT
o-w-ocwwo~wocwo~M~o~w~~~~~~~~~~~~o~
4080 4160
CBVMDASSWAFRSLIBTYBTNCKVLKV O~O~O~O~OCW~O~W~~~OO~O~OAC~~~~~~~WW~O 4240 SATPPGREVEFTT FPVKLVVEDSLS GAGAGGOAG?iWOCAC!AACA WCAGCAACACCACCAGGAA CL OOCCCAGOG&AAOOAGOGGWGAAGAOAGCCOAOCW 4320 FKTPVES SNCDMI YGNNLLVYV WAAGACAWVGOCGACAW AGCAAWGOGACAOGAOC CL OACGGAAAOAAWOAWAGOGOAOGoA 4400 ASYNEVD LSKLLVAREFNVTKVDGRT GcvAGooAoAAoGAAGoAGAo cl& COWCAMAVUACOAGoAGCOCWGAAOoCAAOGVCACGAAAGOAG&OGGOAGGAC 4400 YKHGELEIVTRGTKSKPBPVVATNII AAOGAAGCA OGGVGAACoCGAGAOOGOGACACGAGGCA CAAAGAGOAAGCCACAWWGOOGOCGCCACOAAOAWAOOG4560 ENGVTLDIDVVIDFGMKVSPFLDVDNR AGAAOGGAWAACoooAGAOAOAGAOWOGWAWGAOWOGG%iOGAAAGooAGCCCAOWWAGAOWAGAOAAOAGG4640 S V A Y N K VW&&&$&E R I GRVGRI OCOGoAGCAOACAAoAAGGo WGGAAGGGOAGGoCGCAoA GAAcGAAoo c%GAbE 4720 TALRIGHTEKGLIEIP MISTEAALY CACCGCACWCGGAOAGGOCACACOAAOAGAAAOACCO &AA OGAOAOCAACVGAAGCo GCOCOGOAW 4800 CFAYNLPVYSSGVSTSMIKNCTIP GCWOGCGoACAAoooACCAGOCAOGOCOAWGGC!GOCOCCA CAAGCAOGAoOUAAAWGOACAAOA CCAC%A&C& 4000 TMHTFELSPIFWYNFVSHDGTMHPVVH ACAAOGCdOACAWOGAGWGAWCCAOWWCAOGOACAACoWGOGOCACAOGACGGAACAAOGCAOCCXGOOGOCCA 4960 ETLKRYKLRDSVIPLSESSIPYRASS OGAAACOCOCAAGCGCOAOAAAWGCGOGAWCGGooAWCCAOOAAGOG&GAGWCAAOOCCAOACAGAGCWCOAGOG 5040 DWITAGDYRRIGVKLDIPDETRIAFHI ACoGGaoCACAGCOGGOGACOACAGGCGOAWGGAGOGAAkCOGGMlAWCCAGAOGlAA cGcGAAwGCAIJwC 5120 FIG. 1 -Continued
PepMoV
GENOME
SEQUENCE
23
5121
XTFNRXFTNNLWESVLXYXASAAFPTL AAGAcAwccAcacAAA W~~~~GU~~AUM~UCU~~UUC~~W
5201
R S S S I T X GCGAvcAvcAvcAAWAcAMGA
5281
FXSLIDNGCSSMFS ~LGLEDvERTvPwPvv~vMv~wv~GAUAADGCOUCCOCAAG[lAUCOIJCU
5360
5361
VVGISNALRAXYSXDHTVENINXLETV GvGGWGGAAvcvcAAAvGcAcvcAGAGcv
5440
5441
KAQLKEFc~~v~C~v~v~~w~~~~~c~w~~~
5520
5521
SXSSLAXALGLRGVWNXSLIV F v II II &: WVGVGcAVcAc cl GVCCAAGvCWCVCWGcGAAGGCVCWGGA-GWvGGAAVAAAVcAcVcAWGW
5600
5601
RDAIIAAGVACGGAWLLYTWFTAXMSE cccGADGcCADCADUGCCGCCGGDGDDGCAUCOGCUGCUC
5680
5681
v s II AGvGAGvcm
5761
FEIDNNEDTIEEYFGSAYTXXGXGXGT WCAORWGAOAACAAOGAAGAUACOAUOGAAGAGUAC[L
&&gm&c$
5200
S T D L Y A I P R T GAGCACVGAWvGvAVGCAAWccAcGVAcW
L A V V VAGCAGWGVGG
A?iAVAVVCGAAAGAVCAcAcVGVGGAGAAVAV~VAUcWGAAACVGv
LXFRXARDXRAG VVGAAAVV-GAV-VGGAV
5280
5760 GGVAAAGGCACA FSYIXF CL VVCVCWACAVcA?iAW
5840
5921
TVGMGRTNRRFINWYGFEPG ACVGWGGCAVGGGcAGAAcAAAcAGAcGAWvAVcAAcAVGvAVGGGWVGIVZ’XCGGA WEENVYADIVDV VDPLTGA VGWGAVCCGCVCACAGGVGCA &A VGGAGGAAAAVGWVACGcVGAVAvAGvcGAVW
6001
~~~~~~~~~~~~~v~~v~~v~~v~~v~~~~~v~v~W~c~~~
6080
6081
XDWSNXALXVDLTPHNPLRVSDXASAI VVGAAAGVGGACVVGACVCcGcAvAAVCCVCWCGGGVAAGVGAVAA~VAV AAAGAvvGGvcAAAvAAGGcA
6160
6161
AA~~V~C~DPREGELR~~~~~~DG~~~~AC'~
6240
5841
E
X F G D Gc.%AGAAAAGWVGGVGAVA
TV CX L &AA CVWWGCAAGVVG
6241
EVVXHEAXTLWRGLRDYNPIA AAGVAGVG&AGcAcCWGAVGAGGGGcCvVcGvGAWAcAAVCcAAVAGcc
6321
TVXSELGETSTYGLGFGGLIIANHHLF ACVGVAAAAVCCGAACvGGGVGAAAcA VCAACAVAVGGWVAGGWVCGGVGGGWAAWAWGcAAAVcAccAWvGW
5920 6000
6320 6400
6481
XSFNGSLEVXSRHGVFRVPNLMAISV CUGAGCVWAAVGGCAGVCVCGAAGWAAAVcACAVcAVGGGGWWvAGAGVGccUAccVGAVGGcCAVAAGcGVCV R L X F R G R D M I I I X W P X D F P V F P kA$Gvi& &AC GACVCAAAWCAGA -VAVG wvcccAGucwcccA
6561
EPASTDRVCLIGSNF GAGCCVGCCVCAACAGAVAGAGVGVGVCVcAWGGAvcAAAcW
6641
SATHPVPRSTFWXHWISTDDGHCGLP CAGVGCCACGC?LVCCAWCCCA~GCAG~A~AVWV
6721
IVSTTDGFILGLHSLANNRNSENYYTA WGVVAGcAcAAcAGAVGGAWVAVCCVAGGGCVAcAVAGVCVAGcAAAVAAvAGGAAcAGVGAAAAVVAWAcAcVGcV
6800
6801
F D S D F E M X I L R S G E WCGACVCCGAWvVGAAAVGAAAAVAVVGAGGAGvGGAGAUA
6880
6881
TVLWGPL CACAGVVCVAVGGGGA~VCVA
6401
ERYISTTVSEI CCL GAAAGAVAC%WVCVACAAcAGVGVCGGAAAV
GGAAGCAWGGAVCVCVAcAGAVGAVGGvCAWWGGWVGcCVA
Nm&gwW&
X N W X Y N P D AAAGAAWGGAAAVAVAAVCCAGA
LTXGTPSGMFXTTXMIED CL CVCACV-cA~GVGGAAVGWvAAAAcCAcA?iAGAVGAWGAAGAW
~A&A~C
7041
LVTXHVVXGECYMFK XAIAYMXS AAGGCC.AWGCAVAvAVGAF&AGV cl GCVCWCACCAAGCAVWVGVGAAGGGVGAGVWAVGAVGWcAAA
7121
PRANEFF ccccAGGGcAAAvGAAwvw
6640 6720
7040 c% WhJf!
XYWAYGXSMLNXEA
-UGUGGGCWAV
6560
6960
ABTSSWYLEVLXENL GAGGGAAfzLGcA R * CACACAVCAvCWGGAVGCVVGAAWCVv~WVG
6961
6480
CGAAACAGVAVGWGAKVAAGGAAGCCV
7120 7200
7201
YIXDIMXYSXVIDVGVVDCDRHLRXLS AVAV-VAVA%VGAAAVAWcAAAGGWAVVGAVWAGGAW~VCGAWGcGAccGGcAWV-AVcG
7281
LELLYT WAGAGWAVVGVAVACV
7361
NITTAVGAWYGGXXXEYFEXFTTEDX GAAVAVA?iCCAcAGcVWVGGAGcVAVGVAV~
7441
cvGAGAwcvcc
7521
RSXEXIEANXTRTFTAAPIDTLLGGXV AGAaGuAAGGAMA GAVAGAAGccAAVAAGAcACGGACWVcACAGCAGCcccAAWGAVAcACVAWAGGvGGVAAGGV
7600
7601
CVDDLNN FYSXNIECCWTVGMTKFY GVGVGVAGAVGAVCVcAAcAAC c% GUWVAW CGAAAAAVAVVGAAVGWGVVGGAcGGWGG?iAVGAccAAAVVCVAVG
7680
IHGFRXCSYITDEEEIFXAL & GAVCCAcGGcWVCGCAAAVGWCWAcAVcAcAGAVGAAGAGGAGAVAVVcAAGGcAW
7280 7360
WACWVGUAAGWcACAACAGAGGAVAAGG
7440
SCLRLYTGXLGVWEWALXAEL GCVGWVGAGGVVGVAcAfXGGVAAACVGGGvWWGGGAAVGGGCVCVAAAAGcvG?bAcVG
7520
FIG. 1-Continued
VANCE
24
ET AL.
7 681
GGWDKLLTALPAGWIYCDADGS GACAGCwwCCuGWGGAuGGAuAuAWWGA wGGAwGGAuAAGcww
7761
LTPYLINAVLTIRYAPYEDWDIGYKML WCACACCDllADC
9’ D 8 S UGCAGAWGWC OCBAWWAUAGWCA GGAAGAWGGGACAWGGGUAUAAGAUGW
7760 7840
7841
OCP~GYT~EIIYTPAU~C~V~IIF
7920
7921
PSTVVDNSLYVVLAYRYAPVREG ACAGU ~~~~~u~~~~~~u~~w~w~~~~~u~uww~
8000
8001
IATEEIDSICKPFVNGDDLLIAVNPER WcAcuccAUAUw~DCVOCCVllAAUCCAOACGAVU AUAGCAWUGAAGAAA
8080
8081
ESLLDTLSNHFSDLGLNYDFSSRTRN CCACACACUGUCGAAUCAWWUCUGAWUAGGGCU U-UAW
8161
KSELWFYSBCGISVEGTYIPKLEEERI AAu~WwGCOIICAIlCUCACADOCDCGCAmLlCUG
8241
WDRAELPEYRLEAICAAMIESW v s I L ACCAG&AUACAGAWGGAGGCuAUWGu GUAUCAAUUCU c&A wGGAccGAGcAGAGw
8321
G Y P LTXEIRRFYSWLIEKNPYADLA GGGCUACCCAaA WAACwAUGMAWCGAAGAWCUAUAGCwGWAAWGA
8401
SEGKAPYISELALKKLYLW CUGAAGCAAAAGCUCCADACADUDCDGAACDACCUCDAAA~UQUVAPMMSF~
8481
RSYLKYE’ADADEEFECGTYEVRH A~u~U~~U~~~~U~~~~~U~~CW~u
8561
RSDT&~u&u~~KNKEVATVSDGMGK AAGAU-
8641
KEVESTRDSDVNAGTVGTPTIPRIKSI ~~W~UCAACACCC~WW~UGU~U~CUWU~
8721
KRKGVLNLARLLEYKPS 8 -UGWCUCAACWGGCuCAwuACUuGAAUACAAACXAAGC ACUGAAAAGAQGCGUAUGCCUAAA
8801
v D I ’ N AGwGAcAuAwGAAcAc
8881
EEAMGTVMNGLYVWCIENGTSPNISG %ACAGCACGCAADCCCUACACDC;A(IGAADGCCWAAOCGVO
8961
TWTYYD ACAUGGACC?&UGAUGGAU
90 4 1
~~*~~~~~U~UGVAUEUYUAIE#~~NKQUPY~~P~RU
9121
YGLVRNLRDMGLARYAFDFYEVTSRTS AWGWWGWCGMAWUACGAGACA
9201
TRAREAHI ACACWGCUCGCGAAGCCCAUAUC &AA !LALtAAALKSA GcAGcAwGAAAucwcu
N
S
G
CAAWAWAWUCUCAUCUCGGACGAWAAUA
8160 8240
GCAGCAAUGAWGAAUCAUG
GAAGAAcccAuAcGcuGAcwGGcArJ
8320 8400 8480
ciLx&&u:
8560 8640
GAAAaAuAAAGAAGwGccAcuwwctxAwGAAwGGAAAGA uucAccAwccAAGAAuc.AAAucAAuc
TEKMRMPK
8720 cl
TuC~U~~~~~~~DWWYWCuEVY~AYDULA=
8800 8880
CCCAAACAWAGWGA
VEIPLKPVIENAKPTFR wGGAAwcccAwAAAGcc
cGuGAuAGAGAAu-GccGAcww~
8960 9040 9120
UCCCDCUCCCUCCAOACGCAUUDCACDDCDAUCAACUCAC CL
9200
TRLFGLDG cAAGGwAwuGGAwGGAuGG
GENTERBTTEDVSPDMBTLLG CCAC;AAAACACAGACCGCCACACCACUGAAGAUGUCAGC
9280 9360
9361
VREY* WAGGGAMUWGACUGAUGwGuCUCWGGAwAAAUAwAWAUAUGUAGUAwCAAUAUAUAGuAUGGCWWCUCG
9440
9441
WCCAGUCWUAuAWAAwAGAGuAACWAAWAAGuAAWuGuACUuCAAGGAwAAUCAAGGu
9520
9521
CUCAWGAGGUGACWGUWAGUCUGAGWuACwAWWGAWAUAAAGAAUCUCWAGAAAACGAGAG
9601
ACACACUC-
GAuCWAGWC(n)
GAcucucuGAcAcu wAcwcuAG
9600
9640 FIG. 1 -Continued
RESULTS AND DISCUSSION Primary structure of the PepMoV C sequence The assembled PepMoV C sequence is 9640 nucleotides not including the poly(A) tail (Fig. 1). Computer analyses identified a single long open reading frame (ORF) beginning at position 168 and ending with an UGA termination codon at position 9372 of the virion (+) sense strand, followed by a 3’ noncoding region of 265 bases. The initiation codon at 168 is the first AUG in the sequence and is in the context UAAAAAGC, which is in reasonable agreement with the consensus sequence for plants (AACA,$GC) (Lutcke et a/., 1987). No other ORFs longer than 220 nucleotides were identified in the other two (+) strand
reading frames or in any of the (-) strand reading frames. The overall base composition of the sequence is31.7%A, 18.1%C,23.1%G, 27.096l-J. 3 and 5 untranslated regions. The 5’ leader sequence of the PepMoV C sequence is 167 nucleotides and has a significantly different base composition than that of the overall sequence (36.9% A, 19.0% C, 8.9% G, and 35.19/o U). The low content of G in the 5’leader is typical of potyviruses and other plant viruses (Gallie et a/., 1987). Both 3’ and 5’ untranslated regions of PepMoV C are homologous to the analogous regions of TEV, TVMV, PVY, and PPV, with levels of homology ranging from 33-450/o. In contrast, the levels of nucleotide homology within the coding regions of the same viruses are much higher, ranging from 58-659/o. De-
PepMoV
GENOME
tailed analyses of sequence similarities of both 5’and 3’ untranslated regions with those of related viruses have been published (Turpen, 1989; Vance era/,, 1992). Assignment
of proteolytic
cleavage
sites
Mature potyviral proteins are expressed by proteolytic processing of the precursor polyprotein. These processing events are carried out by at least three known virally encoded proteases, the Nla protease, the helper component-protease (HC-pro), and a protease activity encoded in the 5’terminal cistron (pro-l). Putative Nla protease cleavage sites. The Nla protease of TEV has been shown to cleave at five locations along the TEV polyprotein and each cleavage occurs at a conserved sequence of amino acids which constitutes the TEV Nla protease recognition signal (Dougherty et al., 1988). Similarly conserved sequences have been noted at the identified and proposed Nla protease cleavage sites of the other three sequenced potyviruses, with the exact consensus cleavage sequence somewhat variable among the viruses (Robaglia et al., 1989). Five PepMoV Nla protease cleavage sites have been identified by homology with those of the other four sequenced potyviruses. Each of the proposed PepMoV C Nla protease cleavage sites occurs between glutamine (Q) and either serine (S), glycine (G), or alanine (A), as demonstrated for nearly all potyviral Nla protease cleavages to date. The conserved amino acid sequence around each of the five proposed PepMoV Nla protease cleavage sites, as well as the deduced consensus cleavage site sequence for the PepMoV Nla protease is shown in Fig. 2 (sections D and E). The consensus cleavage site of the PepMoV Nla protease is identical to that of PVY (Fig. 2E; Robaglia et a/., 1989) suggesting a close relationship between these two viral proteases. Putative HC-pro cleavage site. A sixth potyviral cleavage is proposed to occur by action of the potyviral HC-pro protease, a member of the cysteine-type family of proteases (Oh and Carrington, 1989). This protease has recently been shown to cleave between glytine residues and the cleavage site has been mapped to the carboxy-terminus of the HC-pro (Carrington et al., 1989a,b). Two other amino acids located aminoterminal to the diglycine, a tyrosine (Y) at the P4 position and a valine (V) at the P2 position, are required for the cleavage (Carrington and Herndon, 1992) (Fig. 2C). Alignment of the potyviral polyproteins demonstrates that this region is highly conserved in all five sequenced potyviruses (Fig. 2C). The proposed cleavage in the PepMoV C polyprotein would occur at the diglytine sequence at amino acid position 743-744. Putative cleavage site at the amino-terminus of HCpro. A seventh cleavage event defining the amino-ter-
SEQUENCE
25
minus of the HC-pro has been shown to occur via a protease activity encoded in the 5’ terminal cistron of the TEV genomic RNA(Verchot eta/., 1991). The cleavage site has been mapped in the TVMV polyprotein to occur between a phenylalanine (F) at position 256 and a serine (S) at 257 (Mavankal and Rhoads, 1991). The homologous region of the PepMoV polyprotein at position 287-288 (Fig. 2B), has the sequence tyrosine (Y)/ S;which is consistent with the proposed consensus sequence F(Y)/S.
Comparison of PepMoV C proteins other potyviral proteins
with those of
Computer-assisted comparisons indicate that the deduced PepMoV C polyprotein shares homology of at least 65% with those of each of the four sequenced potyviruses. Furthermore, a number of motifs are conserved between the PepMoV C polyprotein sequence and those of the four other sequenced potyviruses and these are discussed below. A proposed map of mature PepMoV C proteins along the deduced polyprotein based on the comparative sequence analyses described below is shown in Fig. 2A. The locations of proposed cleavage sites and conserved motifs are displayed. Nuclear inclusion body protein b (Nib). The most conserved of the individual potyviral proteins is the putative core polymerase (Nib protein) which displays >70% similarity among the five sequenced potyviruses. The PepMoV C Nlb protein is 82.8% similar to the analogous protein in PVY and somewhat less similar to PPV, TVMV, and TEV with percentage similarities of 72.8, 72.3, and 71.2, respectively. The Nlb protein of potyviruses contains a version of the sequence consensus motif [S(T)GXXXTXXXNS(T) (18 to 37AA) GDD] which is conserved in a variety of both animal and plant (+) stranded RNA viral RNA-dependent RNA polymerases (Kamer and Argos, 1984). This polymerase motif is present in the PepMoV C-deduced Nlb protein in a position analogous to that of other sequenced potyviruses when the polyproteins are aligned based on sequence homology (position 2586-2629 of the PepMoV polyprotein sequence). Coat protein. The alignment of the coat protein sequence of PepMoV C with that of several other potyviruses including two strains of PVY has been reported (Vance et al., 1992). The potyviral coat protein sequences are highly conserved throughout most of the sequence but diverge in sequence and length at the amino-terminus. The coat protein sequence of our confirmed PepMoV isolate is approximately 80% similar to those of both strains of PVY and ranged from 71
VANCE
26
A.
ET AL.
Nla protease metal binding site active site glycosylation site WI HC-pro active site polymerase motif NTP binding motif I II I
II
I
I
I
Pro-l HC-pro
Pro-3
Cl /
I
I
Nla
Nlb
CP
\
B. PRO-l/HC-PRO PepMoV PVY TVMV PPV TEV
CLEAVAGE SITES J.
Cleavaqe
S(285)
(255)HF S(257) (307)HY S(309) (303)HY S(305)
Pro3/CI CI/CK
6K/NIa NIa/NIb
NIb/CP
"6Kl"/CI NIa internal cleavage
C.
(1151)s G V R (1785)Q F V H (1837)s E V S (2271)E S V R (2790)Y E V R (1099)K (2025)E
Sequence Pl SP'l S L D(1159) Ii Q 9 Q s K S(1793) 6 Q G R S(1845) E Q A H T(2279) E Q S S R(2798)
QV V B Q V V K il E
R S T(1107) A K T(2033)
E.
HC-PRO/PRO-3 PepMoV PVY TVMV PPV TEV
PROTEASE CLEAVAGE SITES
Junction
(28639~ S(288) (283)QF
PepMoV NIa
(739)H (736)H (709)Q (762)T (759)T
CLEAVAGE SITES P4 Y R Y R Y K Y L Y N
P2 .t V G G V G G V G G V G G V G G
I . L L M
V(746) V(742) V(716) E(769) N(766)
NIa
CONSENSUS CLEAVAGE SITES PepMoV
V-"Q/t E G
PVY
A V-HQ/S E G
PPV
V-HQ,: T G
TVMV
VRFQ/S KT G
TEV
E--Y-Q/S G
FIG. 2. Proposed map of the PepMoV polyprotein showing the location of putative cleavage sites and the individual viral polypeptides they demarcate. Specific motifs referred to in the text and their locations along the proposed PepMoV polyprotein map are indicated in A. A comparison of the amino acid sequence of the proposed pro-l protease cleavage sites of PepMoV C, PVY, TVMV, PPV, and TEV is shown in B. A similar comparison of proposed cleavage sites for the HC-pro protease is shown in C. (D) The deduced amino acid sequence around the five proposed PepMoV Nla protease cleavage sites. The sequences of two degenerate consensus Nla protease cleavage sites which may delineate the viral 6K, and VPg proteins are also shown in D. A consensus PepMoV Nla protease cleavage site based on conservation of amino acid sequence among the five sites is shown in E along with the Nla protease consensus cleavage sites of the four other sequenced potyviruses. The locations of the amino acid sequences within the various viral polyproteins in B-E are indicated in parentheses. In each case, the site of cleavage is indicated by an arrow or a diagonal line. Amino acids conserved in all five viral sequences are displayed in bold letters.
to 76% similar to those of TEV, PPV, and TVMV. Previous work has demonstrated that strains of the same potyviral species display high levels of coat protein sequence homology (greater than 90% similar) and that this homology level is significantly reduced among distinct viral species and ranges from 40 to 80% similarity (Shukla and Ward, 1988, 1989). Based on the level of
coat protein homology, PepMoV C is a viral species distinct from, but closely related to PVY. Cytoplasmic inclusion body protein (Cl). The putative Cl protein sequence of PepMoV C is highly conserved between PepMoV C and PVY (-80% similarity) and significantly less conserved in comparison with the analogous protein sequences of PPV, TEV, and TVMV
PepMoV GENOME SEQUENCE
(6 l-69% similarity). A nucleotide-binding motif (GAVGSGKST) located near the amino-terminus of the putative Cl protein has been previously correlated with helicase-like proteins (Hodgman, 1988). This motif, located at position 1241-l 249 of the PepMoV C polyprotein, is strictly conserved in the analogous positions of all the sequenced potyviruses. Nuclear inclusion body protein a (Nla). The potyviral Nla protein is thought to be multifunctional with an amino-terminal domain serving as the viral VPg (Dougherty and Parks, 199 1; Murphy et al,, 1990; Shahabuddin et a/., 1988) and the carboxy-terminal domain functioning as a protease. An internal cleavage site separating these two Nla protein domains has been proposed in the TEV polyprotein between a glutamic acid residue at position 2037 and a glycine residue at position 2038 (Dougherty and Parks, 1991). This region of the TEV polyprotein has some features characteristic of the Nla protease consensus cleavage site, and the internal cleavage appears to be mediated by the activity of the TEV Nla protease (Dougherty and Parks, 1991). The analogous region of the PepMoV C polyprotein is a typical Nla protease cleavage site except for the substitution of a glutamic acid (E) for the consensus Q in the Pl position (Fig. 2D). The level of sequence homology of the PepMoV Nla protein with those of other potyviruses (80% similarity with PVY and 6 l-69% similarity with TEV, TVMV, and PPV) and the presence of the putative internal cleavage site, suggest that the PepMoV Nla protein is organized into VPg and protease domains in a manner similar to that deduced for TEV and TVMV (Dougherty and Parks, 199 1; Murphy et al., 1990). An amino acid triad (H, D, C) located near the carboxy-terminus of the Nla protein has been shown to constitute the active site residues of the Nla protease domain in TEV (Dougherty et a/., 1989). Alignment of the potyviral polyprotein sequences shows that the location and spacing of these three amino acids is strictly conserved in all the sequenced potyviral Nla proteases including that of PepMoV C. Thus, the amino acid triad is likely to serve the same function in these related proteases. The putative localization of the PepMoV Nla protease on the deduced polyprotein was analyzed by a functional assay. In this analysis, a nested set of deletion cDNA clones encoding various 3’ subsets of the PepMoV genome were expressed in bacteria. The presence of PepMoV Nla protease coding region on each particular cDNA was assayed by ability of the encoded polyprotein to accurately cleave the PepMoV coat protein sequence encoded at the 3’ end of each expressed cDNA. Accurate cleavage by the Nla protease was assayed by the appearance of a protein
27
A.
VI234567
B.
T PepMoV
IP-1 1 HC 1 P-3 1 Cl
Deletions I
11 Nla 1 Nib
lCq
2 3
4 5 67FIG. 3. Map of deletion clones of PepMoV genomic RNA and Western blot of deleted proteins expressed in bacteria. The seven cDNA deletion clones (l-7) and their position alongthe PepMoV genomic RNA sequence are shown in B. The coding regions referred to as pro-l and pro-3 in the text are labeled P-l and P-3, respectively, in the polyprotein map. (A)Western blot of proteins isolated from bacteria expressing the various deletion cDNAs and reacted with antiserum specific for the PepMoV coat protein. Lanes l-7 show proteins from bacteria expressing cDNA clones l-7, respectively. The position of PepMoV coat protein from purified virions is shown in the lane marked V.
product which comigrated with authentic PepMoV coat protein and was immunoreactive with polyclonal antiserum specific for PepMoV coat protein. Figure 3B shows the structure of each deletion cDNA clone and the localization of the PepMoV gene products as deduced by homology with other sequenced potyviral polyproteins. Figure 3A shows a Western blot of proteins derived from expression of each of the seven deletion clones in E. co/i and probed with antiserum specific to PepMoV coat protein sequences. Deletion clones l-4 all show the expression of mature-sized PepMoV coat protein (Fig. 3A, lanes l-4, shows proteins from deletion clones l-4, respectively). However, clones 5 and 6, in which most or all of the sequence homologous to potyviral Nla proteases is deleted, express coat protein sequences as part of a polypeptide larger than mature coat protein (Fig. 3A, lanes 5 and 6, respectively). Deletion clone 7, which is deleted into the coat protein sequence, expresses coat protein sequences as a polypeptide smaller than mature coat protein (Fig. 3A, lane 7). Thus the ability of expressed cDNAs to produce mature-sized PepMoV coat protein
VANCE
28
-1.0
ET AL.
1
-
--I
I
O
I
I
I
1
I
I
I
I
1,000
(
2,000
I
I
I
I
1
I-
3,000
Position
FIG. 4. Similarity plot displaying the degree of similarity at different positions along an alignment of the putative polyproteins of PepMoV, PVY, TVMV, TEV, and PPV. Viral polyproteins were aligned using the progressive alignment method of Feng and Doolittle (1987) and the average similarity among members of the alignment was calculated at each position using a sliding 1O-amino-acid window for the comparison. The average similarity across the entire alignment is shown as a dotted line and similarity at each position is plotted along the line. A perfect symbol match has a value of 1.5 and non-matches, depending on evolutionary distance, have values less than 1.5. The position of putative PepMoV proteins along the alignment is shown below the plot. Coding regions referred to as pro-l and pro-3 in the text are labeled P-l and P-3, respectively, in the diagram.
is correlated with presence of Nla protease sequences as deduced by comparative sequence homology. 6K proteins. A small polypeptide termed “6K protein” is located between the Cl body protein and the Nla protease on the potyviral polyprotein. This protein in PepMoV is 7 1.1% similar to the analogous protein in PVY and 59-63% similar to those in TEV, TVMV, and PPV. The amino-terminal cleavage site defining this protein in the sequenced PPV isolate is unusual in that it contains tyrosine in the P’l position (+l relative to the peptide bond cleavage) instead of the consensus S, A, or G (Maiss et al., 1989). However, recent evidence suggests that the PPV 6K site is processed (Garcia et a/., 1990). The 6K protein was once thought to encode the potyviral VPg, but a variety of recent evidence suggests that this is not the case (Murphy et a/., 1990; Shahabuddin et al., 1988) and the function of the 6K protein is currently unknown. A second small protein termed “6K,“, located to the amino-terminus of the Cl protein, has been proposed for PPV based on the presence of a consensus Nla protease cleavage site (WHO/S) at position 11 131 117 of the PPV polyprotein (Garcia eta/., 1989). Alignment of the deduced polyproteins of PepMoV, PVY, TEV, and TVMV with that of PPV indicates that the analogous regions of these viral polyproteins contain degenerate Nla protease cleavage sites, each differing at one amino acid from the consensus cleavage site of the viral Nla protease. In PepMoV, the analogous re-
gion contains the sequence VVHQR, with an arginine (R) in the putative P’l position instead of the consensus A, S, or G (Fig. 2D). It is not known if this potential Nla protease cleavage site is used in any of the viral isolates.
Conserved motifs in the amino-terminal portion of the polyprotein. The sequences at the amino-terminal end of the polyprotein are much less conserved among the sequenced potyviruses than are those at the carboxy-terminus. However, at least three motifs in this region are conserved in TEV, TVMV, PPV, and PVY polyproteins and this conservation may now be extended to PepMoV C. The first motif is a “zinc finger”like metal binding motif [C-8AA-C-13AA-C-4AA-C2AA-C] which was first noted in the PVY sequence (Robaglia eta/., 1989) and is strictly conserved in all five sequenced potyviruses. The second conserved motif is the [C-72AA-H] which is present within the putative HC-pro. This motif is required for protease activity of the HC-pro in TEV (Oh and Carrington, 1989) and is tightly conserved among all the sequenced potyviruses and thus may serve the same function in these closely related viruses. The HC-pro of some potyviruses is known to be glycosylated (Berger and Pirone, 1986) and a number of possible glycosylation motifs have been identified from the deduced amino acid sequences of the putative HC-pro region of the genome. One of these glycosylation motifs, at position 453-455 of the PepMoV polyprotein, is conserved in the analo-
PepMoV GENOME SEQUENCE gous regions of all the sequenced potyviruses and therefore may be important for the function of this viral protein. Overall homology
among potyviral
polyproteins
Comparison of the complete amino acid sequences of PepMoV, PVY, TVMV, TEV, and PPV was used to determine the overall homology levels among the five sequenced potyviruses and computer-assisted alignment of the sequences was used to plot the similarity of different regions along the polyprotein (Fig. 4). The percentage of overall similarity is highest between PepMoV C and PVY polyproteins at 76.3% and significantly lower between PepMoV C and PPV, TVMV, and TEVat 67.5,65.1, and 65.2%, respectively. Thesimilarity plot analysis indicates that the homology level among the five viruses varies along the polyprotein. In general, the carboxy-terminal 2/3 of the polyprotein, starting at the Cl protein and continuing through to the coat protein, is highly conserved among all the potyviruses. The putative HC-pro region is also relatively highly conserved. In contrast, the putative pro-l and pro-3 proteins and the amino-termini of the coat proteins are much less conserved. These diverged regions of the potyviral polyprotein may function in processes which differentiate the potyviral species, such as host range determination. ACKNOWLEDGMENTS The authors gratefully acknowledge Dr. Lewis Bowman for critical reading of the manuscript. This work was supported in part by Sandoz Crop Protection Corp. and in part by Grant DBM-8817233 from the National Science Foundation.
REFERENCES AEDALLA, 0. A., DESJARDINS,P. R., and DODD% J. A. (199 1). Identification, disease incidence, and distribution of viruses infecting peppers in California. Plant Dis. 75, 101 g-1023. ALLISON, R.. JOHNSTON, R. E., and DOUGHERN, W. G. (1986). The nucleotide sequence of the coding region of tobacco etch virus genomic RNA: Evidence for the synthesis of a single polyprotein. Virology 154, 9-20. BERGER, P. H., and PIRONE, T. P. (1986). Evidence that potyvirus helper component is a glycoprotein. Phyropafhology 76, 1063. BRADFORD, M. (1976). A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 72, 248-252. BRUENING,G., BEACHY, R., SCALLA, R.. and ZAITLIN. M. (1976). /n vivo and in vitro translation of the ribonucleic acids of a cowpea strain of tobacco mosaic virus. Virology 71, 498-517. CARRINGTON.J. C., CARY, S. M., PARKS,T. D., and DOUGHERTY,W. G. (1989a). A second proteinase encoded by a plant potyvirus genome. EMBO J. 8,365-370. CARRINGTON,J. C., FREED, D. D., and SANDERS, T. S. (1989b). Autoproteolytic processing of potyvirus proteinase HC-Pro in Escherichia co/i and in vitro. J. Viral. 63, 4459-4463.
29
CARRINGTON,1. C., and HERNDON, K. L. (1992). Characterization of the potyvirai HC-Pro autoproteoiytic cleavage site. Virology 167, 308315. DEMEJIA, M. V. G., HIEBERT, E., PIJRCIFULL,D. E., THORNBURY,D. W., and PIRONE,T. P. (1985). Identification of potyviral amorphous inclusion protein as a nonstructural, virus-specific protein related to helper component. Virology 142, 34-43. DOMIER. L. L., FRANKLIN, K. M., SHAHABUDDIN, M., HELLMANN, G. M., OVERMEYER,J. H., HIREMATH, S. T., SIAW, M. F. E., LOMONOSSOFF, G. P., SHAW, J. G., and RHOADS, R. E. (1986). The nucleotide sequence of tobacco vein mottling virus RNA. Nucleic Acids Res. 14, 5417-5430. DOUGHERTY,W. G., and CARRINGTON, 1. C. (1988). Expression and function of the potyviral gene products. Annu. Rev. Phyropathol. 26, 123-l 43. DOUGHERTY,W. G., CARRINGTON,J. C., CARY, S. M., and PARKS,T. D. (1988). Biochemical and mutational analysis of a plant virus polyprotein cleavage site. EMBO J. 7, 1281-l 287. DOUGHERIY, W. G., and PARKS,T. D. (1991). Post-translational processing of the tobacco etch virus 49-kDa small nuclear inclusion polyprotein: Identification of an internal cleavage site and delimitation of VPg and proteinase domains. Virology 183, 449-456. DOUGHERTY,W. G., PARKS,T. D., CARY, S. M., BAZ~N, J. F., and FLETTERICK, R. J. (1989). Characterization of the catalytic residues of the tobacco etch virus 49-KDa proteinase. Virology 172, 302310. FENG, D., and DOOLITTLE, R. F. (1987). Progressive sequence alignment as a prerequisite to current phylogenetic trees. J. Mol. Evol. 25, 351-360. GALLIE, D. R., SLEAT, D. E., WA?-% 1. W., TURNER, P. C., and WILSON, T. M. A. (1987). A comparison of eukaryotic viral 5’-leader sequences as enhancers of mRNA expression in vivo. Nucleic Acids Res. 15, 8693-8711. GARCIA, J. A., LAIN, S., CERVERA,M. T., RIECHMANN,J. L., and MARTIN. M. T. (1990). Mutational analysis of plum pox potyvirus polyprotein processing by the Nla proteinase in Escherichia co/i. 1. Gen. l&o/. 71) 2773-2779. GARCIA,J. A., RIECHMANN,1. L., and LAIN, S. (1989). Artificial cleavage site recognized by the plum pox potyvirus in Escherichia co/i. J. Viral. 63, 2457-2460. HODGMAN, T. C. (1988). A conserved NTP-motif in putative helicases. Nature (London) 333, 22-23. L~TCKE, H. A.. CHOW, K. C., MICKEL, F. S., Moss, K. A., KERN, H. F., and SCHEELE, G. A. (1987). Selection for AUG initiation codons differs in plants and animals. EMBO J. 6, 43-48. KAMER, G., and ARGO% P. (1984). Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Res. 12, 7269-7282. MAISS, E., TIMPE, U., BRISSKE,A., JELKMANN.W., CASPER.R., HIMMLER, G., MAITANOVICH, D., and KATINGER, H. W. D. (1989). The complete nucleotide sequence of plum poxvirus RNA. J. Gen. viral. 70, 513-524. MAVANKAL, G., and RHOADS, R. E. (1991). /n vitro cleavage near the N-terminus of the helper component protein in the tobacco vein mottling virus polyprotein. Virology 185, 72 l-73 1. MURPHY, J. F., RHOADS, R. E., HUNT, A. G., and SHAW, J. G. (1990). The VPg of tobacco etch virus RNA is the 49-kDa proteinase or the N-terminal 24-kDa part of the proteinase. Virology 178, 285-288. NELSON, M. R., and WHEELER, R. E. (1972). A new virus disease of pepper in Arizona. Plant Dis. Rep. 56, 731-735. NELSON, M. R., and WHEELER,R. E. (1978). Biological and serological characterization and separation of potyviruses that infect peppers. Phytopathology 68, 979-984. OH, C-S., and CARRINGTON,1. C. (1989). Identification of the essential
30
VANCE
residues in potyvirus proteinase HC-Pro by site-directed mutagenesis. Virology 173, 692-699. PURCIFULL,D. E., HIEBERT, E., and MCDONALD, J. G. (1973). Immunochemical specificity of cytoplasmic inclusions induced by viruses in the potato Y group. Virology 55, 275-279. PURCIFULL,D. E., ZITTER, T. A., and HIEBERT, E. (1975). Morphology, host range and serological relationships of pepper mottle virus.
Phytopathology 65, 559-562. ROBAGLIA, C., DURAND-TARDIF, M., TRONCHET, M., BOUDAZIN, G., AsTIER-MANIFACIER,S., and CASSE-DELBART,F. (1989). Nucleotide sequence of potato virus Y (N strain) genomic RNA. J. Gen. Viral. 70,
935-947. SHAHABUDDIN. M., SHAW, 1. G., and RHOADS, R. E. (1988). Mapping of the tobacco vein mottling virus VPg cistron. virology 163, 635637. SHUKIA, D. D., and WARD, C. W. (1988). Amino acid sequence homology of coat proteins as a basis for identification and classification of the potyvirus group. 1. Gen. Virol. 69, 2703-2710.
ET AL. SHUKLA. D. D., and WARD, C. W. (1989). Identification and classification of potyviruses on the basis of coat protein sequence data and serology. Arch. Virol. 106, 17 l-200. TURPEN, T. (1989). Molecular cloning of a potato virus Y genome: Nucleotide sequence homology in non-coding regions of potyviruses. J. Gen. Viral. 70, 1951-l 960. VANCE, V. B. (1991). Replication of potato virus X RNA is altered in coinfections with potato virus Y. Virology 182, 486-494. VANCE, V. B., JORDAN, R., EDWARDSON,J. R.. CHRISTIE, R., PURCIFULL, D. E., TURPEN,T., and FALK, B. (1992). Evidence that pepper mottle virus and potato virus Y are distinct viruses: Analyses of the coat protein and 3’ untranslated sequence of a California isolate of pepper mottle virus. Arch. Viral. [supplement 51, 337-345. VERCHOT,J., KOONIN, E. V., and CARRINGTON,J. C. (1991). The 35-kDa protein from the N-terminus of the potyviral polyprotein functions as a third virus-encoded proteinase. virology 185, 527-535. ZIITER, T. A. (1972). Naturally occurring pepper virus strains in Florida. Plant Dis. Rep. 56, 586-590.