VIROLOGY
161, 73-80 (1987)
Sequence
and Organization
of Southern
Bean Mosaic Virus Genomic
RNA
SHIXUAN WU,’ CLAIRE A. RINEHART, AND PAUL KAESBERG’ Biophysics
Laboratory
and Biochemistry
Department,
University
of Wisconsin, Madison,
Wisconsin 53706
Received January 20, 1987; accepted July 7, 1987
The genomic RNA sequence of the cowpea strain of southern bean mosaic virus (SBMV-C) has been determined. The genome is 4194 nucleotides in length and has four open reading frames. A 5’ proximal open frame, from base 49 to base 603, corresponds to the length of the P4 proteins translated in cell-free extracts from full-length and smaller virion RNA. The largest open frame extends from base 570 to base 3437 and encodes the two largest proteins translated in cell-free extracts from full-length virion RNA. Segments of this open reading frame’s predicted amino acid sequence resemble those of known viral RNA polymerases, ATP-binding proteins, and viral genome-linked proteins. A third open frame extends from base 1895 to base 2380 and has not been correlated with an in vitro translation product. The fourth open reading frame is located in the 3’ terminal region of the genome extending from base 3217 to base 4053. This frame encodes the SBMV capsid protein which is translated from subgenomic, virion RNA. o 1987 Academic
Press,
Inc.
Biolabs); deoxy- and dideoxyribonucleotides (P-L Biochemicals); [Y-~~P]ATP (5000 Ci/mmol), [a-32P]dATP (400-800 Ci/mmol), and [cu-thio-35S]dATP (650 Ci/ mmol) (Amersham); and M 13 primers (New England Biolabs). SBMV-C was propagated in Wgna sinensis L. var. Black Eye and virions were isolated by the method of Hull (1977). Virion RNA was extracted from dissociated virus particles according to Zimmern (1975). Single-stranded, SBMV-C cDNA was prepared with reverse transcriptase after random priming with partially digested calf thymus DNA (Taylor et al., 1976) or synthetic oligonucleotides prepared from a New England Biolabs DNA synthesis kit. Double-stranded DNA formation was prevented by using actinomycin D (40 fig/ ml). The single-stranded cDNA either was labeled and sequenced directly or was fragmented with the restriction enzymes Haelll, //pall, and Taql (Dasgupta and Kaesberg, 1982; Rice and Strauss, 1981) prior to labeling. After phenol extraction, cDNA fragments were 5’ labeled with 32P, fractionated on 8% polyacrylamide7 M urea gels eluted and sequenced by chemical methods (Maxam and Gilbert, 1980; Maniatis et al., 1982). Double-stranded cDNA was synthesized by using reverse transcriptase and synthetic primers complementary to the 3’ ends of positive- and negative-strand SBMV RNA. The ds-cDNA was then ligated into the Smal site of pUCl3 and subsequently transformed into competent Escherichia co/i cells. Double-stranded cDNA was also prepared and cloned into pBR322 as described by Ahlquist et a/. (1981 b). Fragments of cloned cDNA were prepared by digestion with suitable
INTRODUCTION Southern bean mosaic virus (SBMV) is a small, icosahedral plant virus belonging to the Sobemovirus group (Sehgal, 1981). The host range and other biological properties of two diverse strains, the cowpea strain (SBMV-C) and the bean strain (SBMV-B), have been studied extensively (Sehgal, 1981). The SBMV genome consists of a single, positive-sense RNA of molecular weight 1.4 X 1O6 (Mang et al., 1982). The genomic RNA is covalently linked at its 5’ terminus to a small protein (VPg) (Mang et a/., 1982). The VPg is essential for infectivity of SBMV RNA (Veerisetty and Sehgal, 1979). SBMV virions have 180 copies of a single capsid protein, molecular weight 28,214, occupying a T = 3 lattice (Hermodson et al., 1982). The SBMV-C capsid protein tertiary structure is known to a precision of 2.8 A from the elegant crystallographic work of the Rossmann laboratory (Abad-Zapatero et al., 1980; Abdel-Meguid et al., 1981; McCain et al., 1982; Hermodson et al., 1982; Rossmann et al., 1983). SBMV virions have been assembled in vitro from isolated protein and RNA (Hsu et al., 1977; Savithri and Erickson, 1983). In this paper we present the nucleotide sequence of the SBMV-C genomic RNA and thereby delineate its open reading frames and encoded proteins.
MATERIALS AND METHODS Biochemicals and enzymes were obtained from the following sources: restriction enzymes (New England ’ Present address: Institute of Microbiology, Academia Sinica. Bejing, China. ’ To whom reprint requests should be addressed. 73
0042-6822187
$3.00
CopyrIght @ 1987 by Academic Press. Inc. All rights of reproductton I” any form reserved.
74
WU, RINEHART,
AND
KAESBERG
1
loo
101
UCGAUUUACcUCGAUAUUCcUACffiGUUU~GUU~AGUC~UUCGAUffi6~~UACU6UA~UffiffiUUUC~U6GUACUCU~UCGGACCCC~6UAUCCCUG~ OLPRYSYRFRYSRSIGOTVVEFPGTLSOPCIPY
200
201
CGUUGACGUACUCCUUGGAGCUUGUUGGGEUUGGCCUC~CUUC~GA~AC~UG6UU~G~GUU~A~GAUAUCGAC~CUUU~ACG~U~CUUCUC~ VDVLLGACWAWPQPSRHGGLGLODIDPFDASFS
300
301
UGCUGCGUGACCUCUCCGGAGUACUG~CUAUC~GG~CUGUAUU~~CGG; CCVTSPERYCLSRSVLSGVDAFVVRGSCKLCGLG
400
401
GAUUUCUCGnCluUUUCr\A~CCUUUUG~UUCGUGCGU~GCUUGGUC~CACUCCU~6UU~UG6U~GC~CCCGU~~GCC~UC~~GACGACA~ FLONFNPFEIRALLGQTTPGLWWQPVKPVYDDR
500
501
GAACAUUCACCUCCCGUACGAC~CGAGC;UAACGCGAG~UAC~CGC~~CA6U~A~CUGCCG~~U6UAUCGUC~G~U~CCU~UCAUAU6UC~ NIHLPYDSELNARIQRFQYTCRECIVRVAFHMS MYRPSCLSYVL
600
601
UCCUAGUUGCGlUCAUGUGGAGCUUCGCCGUGUGCGCAAAUGCGUUCAU~UACG6GUCA;ACG 5 LVANMWSFAVCANAFIY6SYDPSHNIPIVALHT
700
701
CCUGUGCGCiACffiGGCUCu66UUGAGcAcGUCCGu66CCGCAC~ LCATGLWLSTSVVSFGTRYVRVRVSPEKTQNRT
800
801
AUAUACGUC;CCUCffiGGC;ACCUCAUUU;GACCCCGUU; IYVSSGLPHFDPVYGVVKKCEPMGGGPAIELQVN
900
901
1000
1001
AACCGGUGGnGAGCCUAAcnGUUffiGUUG~CGUU~GA6~GAGAC~C~CCUUG~U~UGGAGCCCG~GUCUACCAU~ffiGGCA~G~UGUCCUCAU~ TGGEPKSLVAVKSGDSTLGF6ARVYHEGNDVLM
1100
1101
GUACCUCnccAUGuGUGGUn~~AC~CCCCAUACA~CUUU~CC~~CGGUCG~UCGGUAGAC~CUG~ACU~GGAGGUUGA~6CUGCA~U~ VPHHVWVNDKPHTALAKNGRSVDTEDWEVEAACA
1200
1201
CUGAUCCACGUAUUGACUUCGUGCUAGUUAAAGUCCCCAC~ DPRIDFVLVKVPTAVWAKLAVRSTKYLAPYHGT
1300
1301
U6CffiUCC~CAUUffiGniGGCAGGAUUCCAAGCAAUC~ AVQTFGGQDSKQLFSGLGKAKALDNAWEFTHTA
1400
1401
1500
1501
1600
1601
UAGUUUUGAGUUUCUAGAGGUGGAGAUUGAGAAUAGGGGC~GUG~UU~GU~GCGUG~UUCGCCUGGGUGCCU~GGG~GCCU~GCGGAC SFEFLEVEIENRGKVKLGKREFAWVPKGKAWVPKGKAWAD
1700
1701
AUGCUUGACEAUGAUGAUUGACCCCUUCCGCCAAAGAU66 MLDDDDLPLPPKMVNGNLVWADAQESFDGALPLN
1800
1801
ACUGCUUGCGGGCG6CCGG~CGCAAUGUC;UGCCGCCCAAUGGCAU6CC~
1900
CLRAAGRNVLPPKLNLVTINSPVDPPTKQVA"C'P 1901
UUCAGAAAUGGUGGAUCAUCGACUUGCAAGUUUAGAGAAGUC~ FRNCGSSTCKFREVSRKPVADAVTAATKVFPELS SEIVDHRLASLEKCLENLLQTLSQPQQKFSQNS
2000
2001
CUGAGCUCGtGUGGCCUGIGGGAUC~G~CUG~~~GCUCCCU~CU~UCC~CffiG~GU~UGUUCCCAC~~AGCCCCG~C~CCUCG~ ELGWPERGSGAEIGSLLLQAGKFVPTKAPSNLE LSSGGLKGDQELKLAPCYSKQESLFPPKPRATSS
2100
2101
GCr\AsCCUA;mnCAACCUCCUCUCCAGGU~CCC~GUC~~CCCCUU~CCUGUUUC~GC~GAAC~UGGUCCUUC~ACGC~UCU~CGffiC~GU~ QAYNNLLSRYPRSKPLACFRQGTWSFDAIFEQV KPITTSSPGTPGRSPL'PVSGKELGPSTQSSSKL
2200
2201
GUCUCGnn~CAACGUCG6cGGAGAUC~~CG~GC~A6UCCAGGG~UCCCCCUCU~CCGCCUCGC~ACC~C~C~GACUUAA~GGCGC~CA~ VSKATSAEINRKGQSRGPPLPPRHHQQGLNGATH SRKQRRRRSTEKASPGVPLSRLATTNKDLMAQH
2300
2301
AU6CffiUUC~UAGCUGCUU~UGU~CffiG~ffiAGUGCCA~UCCUA6CCU~CUUCGffiGA~AUACACGCU~UAUCUCCCA~UGffiAUGGU~ffiAU6GGC~ AVRSCLCNWESATPSLLRGYTRSISH MQFVAACVTGRVPLLASFEDIHALSPTEMVEMGL
2400
2401
2500
2501
CCAAUUGGUCGAAAGAAUGCUCUUUGGAGCUCAG~CGAG QLVERMLFGAQNELEIAEWQSIPSKP6HGLSVI
2600
2601
CACCAAGCUGACGCGAUAUUCCGUGACUUGCGAGUCAAAC~ HQADAIFRDLRVKHTVCPAAEADISGFDWSVQDW
2700
2701
GGG~UUGUGGGCUGAffiU;GlunUGMiAAUCGUUUUGGGC ELWADVENRIVLGSFPPNMARAARNRFSCFMNS
2600
2601
ffiUCCUGClu\CUCUCA~~~UU~~CC~C~~CffiCC~GG~UCAU6~U~UGGAUCUUA~U6CACCUCC~CCACU~UU~GC~AUACG~ VLQLSNGQLLQQELPGIMKSGSYCTSSTNSRIR
2900
2901
U6CCUUAUGGCU6AGCUUiUGGUUCCCCAU66UGCAUCG CLMAELIGSPWCIAHGDOSVEGFVEGAREKYAGL
3000
FIG. 1. Sequence of SBMV-C RNA together with the amino acid translation evety 10 bases.
of the open reading frames. Dots over the sequence are spaced
SEQUENCE
3001
AND
ORGANIZATION
OF SBMV
RNA
75
3100
UGGGGCACUGGUGCAAGGA;UACAAGCCUVGUGCAACCACCCGUGGAGU~CUGUUCCCAEGffiAUUAAACGAAUAAGGC GHLCKOYKPCATTPTGQLVAVEFCSHVIKRNKA
3200
3101 3201
CAUnnGAUUcffiAcUUAuc~CCGGUCUAU~CCAUCACCG~AC~CC~GC~AU~~AGCAUUUGU~UGGCUACC~GCUUGACC~GAAGC~UU~ MSGLFHHRTKPREIRAFVMATRLTKKQL HKIQSYVRSIPSPOKTARDKSICNGYPLOPEAIS
3300
3301
GCAC~GCU~UUCffi~UAcucUUCCAAAI;CCGCCUCGGI.GG~GCG~~CGCG~GCG~CGUGCUGCG~ffiGUGCCC~GCCUACCC~GCUG~UA~ AQAIQNTLPNPPRRKRRAKRRAAQVPKPTQAGVS TSYSEYSSKSASAEATREAACCAGAQAYPSWGI
3400
3401
CCAUGGCCCcUAUUGCUCA~GGGACCAUG~UGAAGCUUA~GCCUCCCAU~CUACGCUCG~CGAUGGACG~GACCAUCUU~UCUC~UGU~AGCUCUCCAC 3500 MAPIAQGTHVKLRPPMLRSSMDVTILSHCELST HGPYCSGDHGEA
3501
UGffiCUCGC~GUCACUGACnGAUAGUUG~UACGUCCGA~CUAGUCAUG~CGUUCACUG~G~AACUUG~CUUCGUGGU~UAGC~AG~CUGGUCGAA~ 3600 ELAVTDTIVVTSELVMPFTVGTWLRGVAQNWSK
3601
UAUGCUUGGGUUGCGAUUAGGUACACGUACCUACCGUCU;ACC~ VAWVAIRYTYLPSCPTTTSGAIHMGFQYDHADTL
3700
3701
UUCCCGuG~UGUC~UCiCUGAGU~C~UG~~GGU~ffiUUACUGG~CCUGUGUGG~AGGGUCAGU~UGGCCUUUG;UUUGUUAAC~UACnAGU~ PVSVNQLSNLKGYVTGPVWEGQSGLCFVNNTKC
3800
3801
CCCGGACACcUCCCGAGCUnUUACCAUAG~CUUGGAUAC~AACG~GUC~CUG~GA~GUACCCCUU~AAGACCGCG~CUGACUAUG~CACCGCUGUC 3900 PDTSRAITIALDTNEVSEKRYPFKTATDYATAV
3901
GGAGUGAAU~CCMCAUUw;C~CAUUCU~GUGCCCGCU~GUUGGUGA~AGCGAUGG~GGAGGAUCA~CU~GACUG~UGUGAACAC~GGGAGGCUU; 4000 GVNANIGNILVPARLVIAHEGGSSKTAVNTGRLV
4001
ACGCCUCAUnCACCAUACG~CUGAUUGAG~CCAUAGCGGCGGCAUU~~UUGUAGCG~GUAU~AAC~CUUUACCCCUGGUGGUUGGCGCCUAGGGG;:4100 ASYTIRLIEPIAAALNL
4101
GGACUCUGCiCAAUAGACU~UGGUUUGGAAAUCUUGGACCCCGA~CGCUAUCCG~AUGG
4194
FIG. 1-Continued
restriction enzymes then treated with phosphatase and labeled with 32P by using polynucleotide kinase. The strands were separated according to Maxam and Gilbert (1980) and sequenced by chemical methods (Maxam and Gilbert, 1980; Maniatis et al., 1982). Some cloned cDNA fragments were also subcloned into M 13 vectors and sequenced by the dideoxyribonucleotide method with [a-thio-35S]dATP (Sanger, 1977). Sequence data were assembled with the program package of Staden (1980) and analyzed with the UWGCG programs (Devereaux et a/., 1984) run on a VAX 1 l/780 computer. Comparisons between SBMV and other viral proteins were made by using the UWGCG programs DOTPLOT, GAP, and PRETTY. RESULTS
AND
DISCUSSION
The SBMV nucleotide sequence (4194 bases) is shown in Fig. 1 together with the predicted amino acid sequences for the four largest ORFs (open reading frames). The following section describes the sequencing strategies employed and subsequent sections describe some of the features of SBMV that have been derived from the primary sequence.
Most of our data were generated by chemical sequencing procedures (Maxam and Gilbert, 1980) on negative-strand cDNA or fragments of it (Fig. 2B and 2C). This procedure, while effective, left some gaps and failed to resolve ambiguities in several regions. Thus, double-stranded, viral cDNA was prepared, cloned, and sequenced on both the positive and negative strands by chemical (Fig. 2D) or dideoxy methods (Fig. 2E). The SBMV sequence was determined from both the positive and negative strands for 83% of the
eee
-
cc e.c
**
C['
CCCe e-CC***-*
Has Hpa
III Ii
Taq
I
* **e--C-..***.&
D
-
+ee *
---w-*.*-C)-
---
I e-c E
-u) e
-o-----u-c
c
-
-l --4
SBMV
*
e B
-4
-
sequence
The 3’ termini of SBMV-C and SBMV-B genomic RNAs are known from previous chemical and enzymatic sequencing (Fig. 2A; Mang et a/., 1982). It is also known that the 5’ terminal residue is blocked with a protein moiety, designated SBMV VPg, and that the 3’terminus contains a free hydroxyl group (Ghosh et al., 1979).
FIG. 2. SBMV genomic RNA1 sequencing strategy. (A) Region reported by Mang ef a/. (1982). Maxam and Gilbert chemical sequencing of negative-strand cDNA prepared from (8) calf thymus DNA fragments used as primers or(C) synthetic oligonucleotide primers. The single-stranded cDNA was fragmented with the restriction enzymes shown in (B) prior to labeling and sequencing. (D) Chemical sequencing of strand-separated cDNA prepared from several cDNA clones. (E) Sanger dideoxy sequencing of cDNA subcloned into Ml 3.
76
WU, RINEHART, AND KAESBERG
genome and the remainder was unambiguously determined from either the plus or minus strand. Because the structure of the VPg and its mode of attachment to the 5’ terminus are not known, we cannot rule out the possibility that additional 5’terminal nucleotide residues exist as a part of the blocking structure. The sequence presented in Fig. 1 includes all bases adjacent to SBMV VPg that are copied by reverse transcriptase. Analogous experiments with rhinovirus 2 (Skern et a/., 1985) rhinovirus 14 (Stanway et a/., 1984) and encephalomyocarditis RNA (A. C. Palmenberg, personal communication) have shown that all bases adjacent to their respective VPgs are copied in similar reactions.
Structural characteristics regions of SBMV RNA
of the noncoding
Most viral RNA genomes are blocked at their 5’ termini either by a cap structure or by the attachment of a protein (VPg). Generally 3’ termini bear a hydroxyl group but frequently this is proceeded by a poly(A) sequence of varying length or by a sequence capable of forming a distinctive tertiary structure which in some cases is aminoacylatable. The mechanism of viral RNA replication depends explicitly on the terminal structural elements (reviewed by Strauss and Stauss, 1983). In picornaviruses, VPgs attached to the terminal nucleotides (VPg-pUpU) have been implicated as primers for replicative initiation of both positive- and negativestranded RNA (reviewed by Rueckert, 1985). Furthermore, picornaviruses and other VPg-containing viruses, such as CPMV and the potyviruses, have 3’ poly(A) termini. It may be inferred that the VPg carried by SBMV RNA is also implicated in the mechanism of viral RNA replication. However, the fact that SBMV RNA carries a 5’ terminal C, a 3’ terminal G, and lacks poly(A) indicates that structurally, at least, and possibly mechanistically the SBMV RNA termini are distinctive. Computer analysis of SBMV RNA, according to the Zuker and Stiegler (1981) secondary structure folding program, is inconsistent with a tightly structured tRNAlike 3’terminus as possessed by some classes of plant viruses (Ahlquist et a/., 1981 a). The SBMV RNA terminus appears to be weakly bonded. The 5’terminal region of viral RNAs is also implicated in the initiation of translation. SBMV RNA, has a relatively low G + C content (G + C = 34%) in the 5’ nontranslated region similar to other RNA viral genomes (Ahlquist eta/., 1981 b; Briand eta/., 1978; Koper-Zworthoff et a/., 1980). Within this nontranslated region is a segment partially complementary to the 3’ end of 18 S ribosomal RNA, suggesting a possible role in ribosome attachment (Hagenbuchle et al., 1978). This putative ribosome binding site (base 26-42) is 7 bases 5’to the
30
SBMV 18s rRNA
40
. .UGAUiJUUCCUACCUiJUGUGUUUCiitii . . . . . . . . . . . . 3’HO-AUUACUAGGAAGGCGUCC. . . . .
5’.
FIG. 3. Base pairing of the 5’ untranslated region of SBMV with consensus sequence of 18 S ribosomal RNA 3’ termini from D. discoideum, B. mofi, mouse, and wheat (Hagenbijchle ef a/., 1978). The overlined AUG(49) is the first codon in ORF 1.
first AUG codon and could potentially form 12 bp (out of 18 residues) with 18 S ribosomal RNA (Fig. 3).
Open reading frames and viral proteins Figure 4 illustrates the positions of all initiation (AUG) and termination (UAG, UAA, UGA) codons within the SBMV-C sequence. Phase 1 contains two open reading frames (ORF 1 and ORF 4). The first open reading frame extends from base 49 to 603 (ORF 1) as indicated by the darkened rectangle. The second open frame in this phase (ORF 4) extends from base 3217 to 4053. The darkened portion of this rectangle corresponds to the SBMV capsid protein cistron, which is silent on the genomic RNA and is expressed via RNA 2, the subgenomic messenger RNA. Phase two contains the third open reading frame (ORF 3) which extends from base 1895 to 2380. The third phase contains a large open reading frame (ORF 2) which overlaps to varying degrees the other three frames. ORF 2 begins at base 570 and continues through base 3437. We have attempted to correlate the major open reading frames defined by the SBMV sequence with the observed products of translation from cell-free extracts. RNA isolated from SBMV virions contain primarily full-length genomic RNA but also contain submolar amounts of fragmented genomic RNA and subgenomic SBMV RNA2. Together, these virion-extracted RNAs can be translated in cell-free extracts to yield proteins designated Pl (105 kDa), P2 (60 kDa), P3 (28 kDa), and P4 (a family of 3 proteins 21-25 kDa) (Mang et al., 1982; our unpublished results). Proteins Pl , P2, and P4 are translated from genomic length RNA, while P3 is translated from RNA2 (Mang et a/., 1982). The P4 proteins are also translated from a variety of RNA sizes that are shorter than genomic length (Mang et a/., 1982). Tryptic analysis of protein P2 from the bean strain has shown it to be a subset of the larger Pl protein and similarly P3 has been shown to be the capsid protein (Mang et al., 1982; Salerno-Rife et a/., 1980). The P4 proteins have been shown to be structurally similar to each other but distinct from Pl (Mang et a/., 1982). How can the pattern of ORFs predicted by the sequence be reconciled with the complex pattern of
SEQUENCE 0 I
2000
1000
I
I
1
I
I
I
1
1
1
I
AND
3000 1
I
I
I
1 I
ORGANIZATION
4000 I
I
I
I
I Phase
1111
I
ORF3
2
FIG. 4. Open reading frames in the three positive-sense translational phases. Each line shows one of the translational phases. The tick marks on the upperside of each phase show the position of AUG codons. The downward ticks represent the position of terminating codons (UAG, UAA, UGA). The boxes show the open reading frames. The filled portion of each box is known to code for protein in vitro.
translation products? Some relationships are clear while some are speculative at best. ORF 1 encodes 185 amino acids with a molecular weight of 21 kDa, suggesting the possibility that it codes for the P4s. Their location in the 5’terminal ORF is consistent with translation of the P4s from both genomic and 3’ degraded RNA. Conceivably, the P4s are a single protein that is post-translationally modified in vitro into electrophoretically distinct forms. ORF 2 encodes a 105-kDa peptide and is the only frame large enough to code for the protein Pl (105 kDa) and we thus infer that Pl sequences reside in ORF 2. Since Pl is expressed from genomic length RNA is must initiate translation at a codon other than the 5’ proximal AUG(49-51). ORF 2 also overlaps the 3’ end of ORF 1 by 34 bases. Some ribosomes may scan through AUG(49-51) and initiate translation at AUG(570-572) of ORF 2. Alternatively, ribosomes may translate ORF 1 into P4 (base 49-603) and then reinitiate translation in ORF 2 at AUG(6 15-6 17) to yield a 103-kDa protein. Kozak (1983) has shown, primarily in animal systems, that the initiating codon for P4, AUG(49-51) is in a poor context for initiation of translation. However, Lijtcke et a/. (1987) found that the efficiency of translational initiation in plant extracts is independent of the codon context 3 bases upstream of an AUG. Thus we are unable to distinguish, from the sequence alone, the amino terminal end of Pl Protein P2 is a subset of Pl and its sequence must reside in ORF 2. Possibly P2 is derived from Pl by proteolytic processing. This would be consistent with the expression observed in several other VPg containing positive-sense RNA viruses (Domier et a/., 1987). ORF 4 potentially encodes a 30.6-kDa protein. P3, the capsid protein, is 28 kDa (Hermodson et a/., 1982) and is expressed from this ORF via subgenomic RNA2
OF SBMV
RNA
77
(Ghosh et a/., 1981). The amino terminal residue of the capsid protein is an acetylated alanine (Hermodson et al., 1982) corresponding to the GCU(3274-3276) codon. Since the capsid protein is the only product observed from RNA2, translation probably begins at the second AUG(327 l-3273) and is followed by post-transational replacement of the N-terminal methionine with an acetyl group similar to other viral capsids (reviewed by Driessen et al., 1985). Examination of the nucleotide context of the first two AUGs shows both of them to be favorable for initiation of translation and we infer that the 5’end of the subgenomic RNA may lie between the two AUGs. Comparison of the capsid protein amino acid sequence reported by Hermodson et al. (1982) with that predicted from our nucleotide sequence shows identity in all but two residues. When Hermodson et al. (1982) determined the capsid sequence they were able to derive it by direct protein sequencing in all but 4 residues. The sequence of these four amino acids was inferred from the partial nucleotide sequence reported by Mang et al. (1982). One of these residues (codon at 39493951) was reported as threonine but our present sequence data identify this codon as isoleucine which is, in fact, more consistent with the amino acid composition data originally reported by Hermodson et a/., (1982). The second difference, an aspartate vs a valine codon located at base 3517-3519, is probably real and may have resulted from a mutation in the interim between the two sequencing projects. ORF 3 overlaps ORF 1 and is capable of encoding a protein of 18.3 kDa. We have not found evidence for cell-free translation of this sequence from either fulllength or subgenomic-length RNAs isolated from virions. Possibly a subgenomic RNA expressing this ORF is present in infected tissue but is not efficiently packaged into virions. With some viruses sequence data and the results of in vitro translation have correlated precisely and have resulted in an enumeration of the virion-encoded genes. Such data have been less revealing in the case of SBMV where several questions remain, such as the number of functional proteins carved out of ORF 2 and the location of VPg. We have begun addressing these problems by examining homologies with other viruses (described below) but in the final analysis further experiments will be needed to identify the complete set of SBMV proteins and their coding domains. Sequence
similarity
to proteins
of other viruses
Regions of sequence conservation that may imply functional similarity have been identified in several viruses. Haseloff et a/. (1984) and Ahlquist et a/. (1985)
78
WU, RINEHART,
found three conserved domains in the nonstructural proteins of four unrelated positive-strand RNA viruses. Kramer and Argos (1984) found sequences corresponding to one of these conserved domains in eight different viral genomes and suggested that the highly conserved residues might span the active site or nucleic acid recognition site of an RNA-dependent RNA polymerase. Gorbalenya and co-workers (1985) have correlated another of these domains with the binding pocket of ATP-utilizing enzymes found in viruses, bacteria, and eucaryotes. Thus we searched the predicted amino acid sequences of the SBMV ORFs for regions corresponding to these domains and also for sequence similarities to the VPgs found in other viruses. The comparison of RNA and DNA polymerase sequences by Kramer and Argos (1984) identified a wellconserved tripeptide sequence, Gly-Asp-Asp (GDD), which is characteristic of this type of enzyme. Our SBMV sequence has one GDD triplet at base 29462954. Alignments with other positive-stranded RNA viral polymerases were centered on this GDD region. Figure 5A shows a consensus sequence for the eight viral polymerases analyzed by Kramer and Argos. Three of the more conserved regions of this consensus are presented together with the corresponding SBMV sequence. Substantial sequence similarity is obvious and we interpret thissimilarity as evidence for an RNA polymerase function encoded in this region. In ATP-utilizing enzymes the active site is highly conserved and has ‘a sequence of NH3- . . . GxxxxGK(S 0rT). . . (Higgins eta/., 1986; Gorbalenya eta/., 1985). In SBMV there is only one region that closely resembles this pattern (Fig. 5B). Higgins eta/. (1986) have recently found further regions of conservation carboxy-proximal to this site in several bacterial enzymes. In SBMV we have also found these extended regions of similarity (Fig. 58) and we regard this as preliminary evidence for an ATP-binding function for this region of ORF2. In another comparison we aligned the VPg sequences from 17 picornaviruses and from CPMV to derive a consensus which was then used to search SBMV ORFs 1 and 2. The best protein alignment begins at base 2067 and is shown in Fig. 5C. Even though the similarity between SBMV and the consensus sequence is weak, it is significant compared to randomized sequences with the same composition. While this positioning of the VPg is only suggestive, it is consistent with the location of the VPg relative to polymerase in the comovirus, picornavirus, and potyvirus genomes (Domier et a/., 1987). If the correlation in genomic organization between SBMV and these viruses holds true, then we might expect to find a viral protease encoded between the polymerase and the VPg domains. While we have not found extensive matches to other viral
AND
KAESBERG A . . . . CONSENSUS
EVDFSKFD-S
SBMV(2661-2690)
EADISGFDWS ’ ’ 0 . . , .
CONSENSUS
SO-S-TFIINTIL
SBMV(2959-2897)
SGSYCTSSTNSR; ’ * ’ . . . ..* .
.
.*
’ *
’ . . . . .
’ ’ .*
CONSENSUS
I - - GDDSLI
SBYV(2937-2963)
; AM t, ; ;, ; ; E
CONSENSUS SBYV(1626-1682)
. . * . . . . . I-LRGKAG-GKSL-BN-IAR . ; E N; ;i K V K L;r ;R E F;
CONSENSUS
QNPDGAMS
SBMV(1773-1796)
ESFDGALP - * * - * * -
CONSENSUS
MIS-LEEK
SBYV(l923-1940)
LAS.LE. ’ ’ ’
CONSENSUS
GPYAGP-QRQ-PLKVKAK-P
SBMV(2067.2126)
SLFPPKPRATSSKPITTSSP ’ ’ ’ ’
B
.
.
’ -
K*
W .;
; k
C l
’
’
’
’
FIG. 5. Amino acid alignment of nonstructural domains identified with putative functions. The numbers following SBMV delineate the range of nucleic acid sequence coding for the amino acids displayed. Consensus sequences were’constructed by aligning several viral sequences described below. All alignments were made using the GAPOUT program (Devereaux et al., 1984) and the peptide scoring table of Staden (1982) with equivalent amino acids having values of 1.1 or greater. Equivalent amino acids using this stringency are AG, AP, AS, AT, DE, DG, DH, DN, DC?, EH, EN, EQ, FI, FL, FY, GS, HN, HQ, HR, IL,, IM, IV, KN, KQ. KR. LM, LV, MV, JQ, JS, PS, QR, RW, ST. The letters in the consensus lines represent the most prevalent amino acid with a homology of at least 60% and the hyphens show where the homology is less than 60%. The dots between the lines mark identical or equivalent residues between the two sequences. The periods in the SBMV lines indicate where gaps have been introduced. A double asterisk indicates that all aligned residues in the consensus sequence at that position are identical. A single asterisk indicates that all the aligned residues are equivalent. (A) The putative RNA dependent RNA polymerase domain. The consensus sequence was derived from Kramer and Argos (1984). (B) The putative ATPbinding domain. A consensus sequence was obtained from an alignment of 18 picornaviral2C regions (A. C. Palmenberg, personal communication), and an alignment of alfalfa mosaic virus, brome mosaic virus, and tobacco mosaic virus (Haseloff eta/., 1984). (C)The putative VPg domain. The consensus sequence was obtained from an alignment of VPgs from 18 picornaviruses and cowpea mosaic virus (A. C. Palmenberg, personal communication).
proteases in this region it does contain a number of cysteine and histidine residues that are essential to protease catalysis. It is evident that the encoded functions in SBMV and their placement in the viral genome have features reminiscent of other well-known positive-sense RNA viruses
SEQUENCE
AND
ORGANIZATION
but in toto SBMV’s characteristics differ from those of any other known virus. SBMV’s synthesis of a number of proteins from such a small genome appears to require a mixture of expression strategies, including overlapping reading frames, internal initiation or reinitiation of translation, production of subgenomic RNA, and perhaps post-translational proteolytic processing and modification. ACKNOWLEDGMENTS We thank Cindy Johnson for technical assistance. This work was supported by Public Health Service Research Grants Al-1466 and Al-l 5342, Postdoctoral Training Grant T32-CA09075, and Career Award Al-2 1942.
REFERENCES ASAD-ZAPATERO, C., ABDEL-MEGUID, S. S.. JOHNSON, J. E., LESLIE, A. G. W., RAYMENT, I., ROSSMANN, M. G., SUCK, D., and TSUKIHARA, T. (1980). Structure of southern bean mosaic virus at 2.8 A resolution. Nature (London) 286, 33-39. ABDEL-MEGUID, S. S., YAMANE, T., FUKUYAMA, K., and ROSSMANN, M. G. (1981). The location of calcium ions in southern bean mosaic virus. Virology 114, 81-85. AHLQUIST, P., DASGUPTA,R., and KAESBERG,P. (1981a) Near identity of 3’RNA secondary structure in bromoviruses and cucumber mosaic virus. Cell 23, 183-l 89. AHLQUIST, P., LUCKOW,V., and KAESBERG,P. (1981 b). Complete Nucleotide Sequence of Brome Mosaic Virus RNA 3. J. Mol. Biol. 153, 23-38. AHLQUIST, P., STRAUSS,E. G., RICE, C. M., STRAUSS,J. H., HASELOFF, J., and ZIMMERN, D. (1985). Sinbis virus proteins nsP1 ans nsP2 contain homology to nonstructural proteins from several RNA plant viruses. J. Viral. 53, 536-542. BRIAND, J-P., KEITH, G., and GUILLEY, H. (1978). Nucleotide sequence at the 5’ extremity of turnip yellow mosaic virus genome and coat mRNA. Proc. Nat/. Acad. Sci. USA 75, 3168-3172. DASGUPTA, R., and KAESBERG,P. (1982). Complete nucleotide sequences of the coat protein messenger RNAs of brome mosaic virus and cowpea chlorotic mottle virus. Nucleic Acids Res. 10, 703-713. DEVEREAUX,J., HAEBERLI,P., and SMITHIES,0. (1984). A comprehensive set of sequence analysis programs in the VAX. Nucleic Acids Res. 12, 387-395. DOMIER,L. L., SHAW, J. G., and RHOADS,R. E. (1987). Potyviral proteins share amino acid sequence homology with picorna-, coma-, and coulimoviral proteins. Virology 158, 20-27. DRIESSEN,H. P. C., DE JONG, W. W., TESSER,G. I., and BLEOMENDAL, H. (1985). The mechanism of N-terminal acetylation of proteins. CRC Grit. Rev. Biochem. 18, 281-325. GHOSH.A., DASGUPTA,R.. SALERNO-RIFE,T.. RUTGERS.T., and KAESBERG. P. (1979). Southern bean mosaic viral RNA has a 5’-linked protein but lacks 3’ terminal poly(A). Nucleic Acids Res 7, 2 137-2 146. GHOSH, A., RUTGERS,T., MANG, K., and KAESBERG,P. (1981). Characterization of the coat protein mRNA of southern bean mosaic virus and its relationship to the genomic RNA. J. Viral. 39, 87-92. GORBALENYA.A. E., BLINOV,V. M.. and KOONIN, E. V. (1985). Prediction of nucleotide-binding properties of virus-specific proteins from their primary structure. Mol. Genet. Mikrobiol. Virusol. 11, 30-36. HAGENBUCHLE,0.. SANTER, M., STEITZ, J. A., and MANS, R. J. (1978). Conservation of the primary structure at the 3’ end of 1 ES rRNA from eucaryotic cells. Ce// 13, 551-563.
OF SBMV
RNA
79
HASELOFF.J., GOELET, P., ZIMMERN, D., AHLQUIST, P., DASGUPTA, R., and KAESBERG.P. (1984). Striking similarities in amino acid sequence among nonstructural proteins encoded by RNA viruses that have dissimilar genomic organization. Proc. Nat/. Acad. Sci. USA 81, 4358-4362. HERMODSON,M. A., ABAD-ZAPATERO,C., ABDEL-MEGUID, S. S., PUNDAK, S., ROSSMANN, M. G., and TREMAINE, J. H. (1982). Amino acid sequence of southern bean mosaic virus coat protein and its relation to the three dimensional structure of the virus. virology 119, 133149. HIGGINS, C. F., HILES, I. D., SALMOND, G. P. C., GILL, D. R., DOWNIE, J. A., EVANS, I. J., HOLLAND, I. B., GRAY, L., BUCKEL, S. D., BELL, A. W., and HERMODSON, M. A. (1986). A family of related ATPbinding subunits coupled to many distinct biological processes in bacteria. Nature (London) 323, 448-450. Hsu, C. H., WHITE, J. A., and SEHGAL,0. P. (1977). Assembly of southern bean mosaic virus from its two subviral intermediates. Virology 81,471-475. HULL, R. (1977). The banding behaviour of the viruses of southern bean mosaic virus group in gradients of caesium sulphate. virology 79, 50-57. KOPER-ZWORTHOFF, E. C., BREDERODE,F. T., VEENEMAN,G., VAN BOOM, J. H., and BOL, J. F. (1980). Nucleotide sequences at the 5’-termini of the alfalfa mosaic virus RNAs and the intercistronic junction in RNA 3. Nucleic Acids Res. 8, 5635-5647. KOZAK, M. (1983). Comparison of initiation of protein synthesis in procaryotes, eucatyotes and organelles. Microbial. Rev. 47, l-45. KRAMER,G., and ARGOS, P. (1984). Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Res. 12, 7269-728 1. LOTCKE,H. A., CHOW, K. C., MICKEL, F. S., Moss, K. A., KERN, H. F., and SCHEELE,G. A. (1987). Selection of AUG initiation codons differs in plants and animals. EMBOJ. 6, 43-48. MANG,.K.. GHOSH. A., and KAESBERG,P. (1982). A comparative study of the cowpea and bean strains of southern bean mosaic virus. virology 116, 264-274. MANIATIS. T., FRITSCH,E. F., and SAMBROOK,J. (1982). In “Molecular Cloning: A Laboratory Manual,” pp. 98-l 06. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. MAXAM, A., and GILBERT,W. (1980). In “Methods in Enzymology” (L. Grossman, and K. Moldave. Eds.). Vol. 65, pp. 499-560, Academic Press, New York. MCCAIN. D. C., VIRUDACHALAM, R., MARKLEY, J. L., ABDEL-MEGUID, S. S., and ROSSMANN, M. G. (1982). Carbon-13 NMR study of southern bean mosaic virus. Virology 117, 501-503. RICE, C. M., and STRAUSS.J. H. (1981). Nucleotide sequence of 26s mRNA of Sindbis virus and deduced sequence of the encoded virus structural proteins. Proc. Nat/. Acad. Sci. USA 78, 20622066. ROSSMANN, M. G., ABAD-ZAPATERO,C., HERMODSON, M. A., and ERICKSON,J. W. (1983). Subunit interactions in southern bean mosaic virus. f. Mol. Biol. 166, 37-83. RUECKERT,R. R. (1985). Picornaviruses and their replication. In “Virology” (B. N. Fields, Ed.), pp. 705-738. Raven Press, New York. SALERNO-RIFE,T., RUTGERS,T., and KAESBERG,P. (1980). Translation of southern bean mosaic virus RNA in wheat embryo and rabbit reticulocyte extracts. J. Viral. 34, 51-58. SANGER,F., NICKLEN,S., and COULSEN,A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Nat/, Acad. Sci. USA 74, 5463-5467. SAVITHRI,H. S., and ERICKSON,J. W. (1983). The self-assembly of the cowpea strain of southern bean mosaic virus: Formation of T = 1 and T = 3 nucleoprotein particles. Virology 126, 328-335.
80
WU, RINEHART, AND KAESBERG
common cold virus: Human rhinovirus 14. Nucleic Acids Res. 12, 0. P. (1981). Southern bean mosaic virus group. In “Hand7859-7875. book of Plant Virus Infections and Comparative Diagnosis” (E. KurSTRAUSS,E. G., and STRAUSS, J. H. (1983). Replication strategies of stak, Ed.), pp. 91-l 21. Elsevier/North-Holland Biomedical, New the single-stranded RNA viruses of eukaryotes. ln “Current Topics York. in Microbiology and Immunology,” Vol. 105. pp. l-98. SpringerSKERN,T., SOMMERGRUBER, W., BIAAS,D., GRUENDLER, P., FRAUNDORFER, Verlag, Berlin/Heidelberg. F., PIELER,C., FOGY,I., and KUECHLER, E. (1985). Human rhinovirus TAYLOR,J. M., ILLMENSEE, R., and SUMMERS, J. (1976). Efficient tran2: Complete nucleotide sequence and proteolytic processing sigscription of RNA into DNA by avian sarcoma virus polymerase. nals in the capsid protein region. Nucleic Acids Res. 13, 211 lBiochim. Biophys. Acta 442, 324-330. 2126. VEERISE~, V., and SEHGAL,0. P. (1979). Genome-linked proteinase STADEN,R. (1980). A new computer method for the storage and maK-sensitive factor essential for the infectivity of southern bean monipulation of DNA gel reading data. Nucleic Acids Res. 8, 3673saic virus. Phytopathology 69, 1048. 3694. ZIMMERN, D. (1975). The 5’ end group of tobacco mosaic virus RNA STADEN,R. (1982). An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. Nucleic Acids is m[7lG[5’lpppGp. Nucleic Acids Res. 2, 1189-l 201. ZUKER,M., and STIEGLER, P. (1981). Optimal computer folding of large Res. 10,2951-2961. RNA sequences using thermodynamics and auxiliary information. STANWAY,G., HUGHES,P. J., MOUNTFORD, R. C., MINOR,P. D., and Nucleic Acids Res. 9, 133-l 48. ALMOND,1. W. (1984). The complete nucleotide sequence of a
SEHGAL,