Partial nucleotide sequence of the Murray Valley encephalitis virus genome

Partial nucleotide sequence of the Murray Valley encephalitis virus genome

/. MO/. Hid. (1986) 187, 309-323 Partial Nucleotide Sequence of the Murray Valley Encephalitis Virus Genome Comparison of the Encoded Polypeptides w...

2MB Sizes 0 Downloads 102 Views

./. MO/. Hid. (1986) 187, 309-323

Partial Nucleotide Sequence of the Murray Valley Encephalitis Virus Genome Comparison of the Encoded Polypeptides with Yellow Fever Virus Structural and Non-structural Proteins Lynn Dalgarno’j-, lI)ivision

Dennis W. Trent2, James H. Strauss1 and Charles M. Rice’1

of Biology. California Institute of Technology Pasadena, Cd 9112.5. I’.S.A.

2Division of I’ector-Borne I’iral Diseases Center for Infectious Diseases, Center fo,r Disease Control Fort Collins, (‘0 80.522. I’.S.d (Received

13 August

1.98~5)

The sequence of 5400 bases corresponding to the 5’.terminal half of the Murray Valley clncephalit)is virus genome has been determined. The genome contains a 5’ non-coding region of about 97 nucleotides, followed by a single continuous open reading frame that &codes the structural proteins followed by the non-structural proteins. Amino acid sequence homology between the Murray Valley encephalitis and yellow fever (Rice et al.. 1985) polyproteins is 42% over the region sequenced. The start point,s of the various Murra! Valley encephalitis virus-coded proteins have been assigned on the basis of this homolog> and a consistent set of potential proteolvtic cleavage sites identified; the sequences of which are similar in Murray Valley encephalitis and yellow fever. The deduced Murray Valle? encephalit)is gene order is 5’ZprM (M)-E-NSl -ns2a-ns2b-NS3-3’. The genome organization of Murray Valley encephalitis and yellow fever appears to be identical and the sizes of t.he predicted virus-coded proteins similar between the two viruses. Roth viruses encode a basic capsid protein followed by three glycoproteins: the glycoproteins appear to have the conventional topology of N terminus outside with a C-terminal membrane-spanning domain. There are conserved glycosylation sites in prM, the precursor to the M protein of the virion, and in h’S1, a non-st)ructural protein of uncertain function. The glycosylation sites in E. the major envelope protein of the virion. are not conserved as to position. LVe predict the existence, in flavivirus-infected cells, of two small, hydrophobic peptides, n&a and n&b, which show only limited amino acid sequence homology. Finally, about half of the amino acid sequence of NS3 has been obt,ained; NS3 is a hydrophilic non-st’ructural prot)ein that shows 55% amino acid sequence similarit between Murray Valley encephalitis and yellow fever over the region sequenced and is probably involved in R#NA replication.

1. Introduction The Flaairir~s genus. which has recently been given family st.atus as the sole genus in the family Flaviviridae. c~ontains some 70 members. many of

1 I’rrsent, address: Department of Microbiology and Immunoloy~. Washington University School of blrdicillr. St I,ouis. $10 631IO. I:.S.A.

whicah are important human or \-et)erinarv pathogens. Moat flaviviruses are transmitt’ed hi7 blood-sucking arthropods, mosquitoes or Gcks. although some members of t)his group apparently lack an arthronod vector. There is considerable interest in the flaviviruses as disrase organisms. sincht, many of them cause widespread cases of serious human illness (Shop?, 1980). ‘rtlr~ mosquito-borne flaviviruses have beeti clivith~tl into three subgroups on the basis of sc~rolopic~alc~ros~-t,rac,tions (Portrrfirld. 1980). These

L. lkdgnrno

310

aw

the

vellow

fever

twephaiitis

(JE)

(IT)?, virus

dengue groups.

and The

Japanese JE

group

Murray Valley encephalitis (MVE). St I.ouis c>ncephalitis (SLE). ffest) Nile (M’N) viruses. wit II ii number of common (AF)id~miolopic~al. t~c~ologicxl and biological feat)urrs including the (~al)a(~it>. to (‘ause envrphnlitis in humans (ShopcL. I !4XO). MYI is active in Trianjaya. Papua/Sew Guinea ard ,Australia. Thr virus is neurotropic and was first isolafed from the brains of fatal human cases in Australia in 1951 (Frenc*h. 1952: Milt3 4 /I/.. 19.51). Al \- I< has brvir responsible for a nunrhri~ of’ OII~ hrraks of rnc’rphalit)is in southeaslern Australia. t ht. most important recavnt outbreaks being those of I!)51 ailcl 195-C (for a rtavir\v. stv I)oht~rty. l!E7). .\I1 hough most human inft~ction \vith 31\-l? is ilrappartbnt 1 with ratios of’ clinicxl inf&*t ion to sub c*li,lical intivtion probably around I : 500 to 1 : I.‘,00 (Arrtl~~rson rt trl.. 1952). the c*ase mortality ratt, in 0Iiv I974 epidemic \vas 200,, (I)ohert>.. 1977). M\‘11: tlxist,s in an vnzootic cycle in northern nrrr22rlirostris as the princaipal .\rist ralia. \vith (‘u/p.t. rru~sc~lllto vwtor. il. Ithough in nort hem Australia t II<, virus has hrrn isolated also from f’trlf~.t hifct~r~iorhyl2ch2r.s and =1rdrs )/01.1/2(1r/,pI/sis (Ihhrrt~ PI 01.. 1963: (‘hamberlain. 1980). Birds. mainl! \\ntrr hi&. play a primary role in the rc+olog~~ of IIIYE (and Kunjin virus) in Australia. similar to the rolr of birds in the rc*ology of JTI:. SLE and LVN. In seasons of unusual rainfall pattern, M\7E ac+ivil> ttxtends from cnzootic foci in northern Australia and it has betln proposed that epidemics in tht south of the continent result from such movements (set’ I)ohrrt v. 1977). Howrver, rrvent stud& (Jlarshall et /I/.. 1982) suggest that. during intcxl.epidemic periods. enzootic foci probably exist in t hta sout hrrn area and that thrsts c~ultl setAd rl)idrmics in thrsr arras. St.ruc+urally. all flaviviruses appear to br very similar (for a review, see Murphy. 1980: Kussell d (II.. 1980: \Vest)awav. 1980). They caontain singkstrandrd RNA. of gpproximately I I kb in length. that is infectious. lacks poly(A). and possesses a 5’ cap of type I (LVrngler it nl.. 1978: (‘leaves CC I)ubin, 1979: LVengler & IVengler. 1981). The RNA is contained in a nucleocapsid believed to possess ivosahrdrnl symmetry with a single species of capsid prot~eiri (‘, of molecular weight about 14.000. The c*apsid is enveloped by a lipid bilayrr in which are vmbcdded t\vo virus-specific. proteins. a membrane f)rotein M that tacks carbohydrates and has an .I[, of about X000 and an envelope protein E of :21, includes

abou1

60.000

that

carbohydrate by

the

may

(*hains. virus

t Abbreviat,ions

occurs

or may

not

Acquisition in

contain

attached

of’ the

membrane

with

association

used: YF, yellow

fever

int’ernal

virus:

.IE. Japanese encephalitis virus: MVE. Murray \‘alley rue-thphalitis virus; SLE, St Louis encephalit,ix virus: \ViY, West Kilr virus: kb. IO3 bases or base-pairs; vl)Nu’A. cornplemrnt,ary DIVA: SCF. soluble complementfixing antigen.

et

al

MVE

RNA

Huang). The ligated mixture was used to transform coli competent (Dagert & Ehrlich, 1979) Escherichia strain MC1061.1 ((Casadaban & Cohen, 1980); MC1061 transduced to tetracycline resistance (TnlO) and ultraviolet sensitivity (rec456) from E. coli strain C600). Ampicillin-resistant colonies were srreened by sizing of supercoiled plasmids. cDSA clones were analyzed b> preliminary rest,riction analysis and limit,ed chemical seqnencinp of’ their termini. After comput,er tjranslation in all possible reading frames these sequences c~~ultl be aligned by homology with the yrllow fever virus polyproteitt (Rice et al., 1985). rsing these approaches. a set of overlapping cDX;;\ clones. 2 to 7 kb in length, hal-e bertt characterized (Rice et al., unpublished results). Of these rlonrs. 2 designated 2/l/22 (54 kb) and 1: l/i:! (6.5 kb). orrrlap and cover approximately 6.7 kb of the 5’ region of the, .\lVE genome and were selecated for sequence analysis.

The base-sprc.itic, c,hemical cleavage method. essentialI> as tlescribed by Jlaxam di Gilbert (19X0). was used throughout. End-labeled restriction fragments were produc~etl by till-in synthesis with the Klenow fragment of !?:‘.coli f,olymerasr I (folk 3’ recessed ends) or brief esontrc~l~~asetrc>attnertt (2 ntin) with T4 fjS.1 polymerast(W.5 ttttits:pg 1)N.A) in the absence of’ tleosyribottuc~lcoticlt~s folloaetl by replacement syttthrsis in the triphosphates ~,rrsf’tltY’ of’ I of the 1%32f ’ j(leosvrtnc*leoside and a vast es
To tr,af> the 5’ entl of JIVE RSAA. 23pmol of the synthetic. oligotlro.\-~nttcleotidr ~‘-T~T-T-(:-I\-.~-,\-T-(‘-.~~ .-\-:\-.A-( ;-(‘-:j’ \v;ts ~thosE)hor~lat,ed with [x-“~P]ATF’ nntl 7’4 polvnucleotitlr kinase (blaxam & (iilbrrt. 1980). ;\tr ;1f’1”.t,Siti)atf,I~ IO-fold molar excess of this rtidLlabelrtl of’ .\I\-R primer was ustbd for reverse transcriptit~tt RX.\ using first-strand (-I)S,I s>,rtthesis grnomic cy)rlditions tkscrild abovt~ and by Ric.cs & Strauss (19816). The printer rstension product,s \vert‘ anal,vzed b? gels t,lt,c,t,roE)horrsis Ott ~tc~rvlaniide/ure~~ srqttrticing (Maxam 8r C:ilkwrt. 1980). and the nlajor bands wtw excised. t~lutctl (Rit.e & Strauss. 1981h). antI srqut~~ltW1 its drst~riIwti

Sequence

311

programs to detect sequence homologies. This method of alignment is similar in concept to that used previously in aligning alphavirus sequences (I)algarno et al., 1983; Strauss et aZ., 1983) and in the majority of eases the MVE clones c~mld readil? he aligned with the YF RSA sequence by the amino arid sequence homology found between Rl\‘H and \‘F (described in more detail hrlon-). In this way JIVE clones that represented almost the entire length of MVE RNA were ohtaittrd. Trio of tltests &nes. designat,ed 211 jS2 (5.1 kh in length) and l,‘l,‘12 (6.5 kl)) overlap and cover 6.7 kh of the 5’ region of MVE RNA. and these were used to obtain the nuclcotidc sequence of the 5’ terminal 5.l kt) of thr JI\‘E pettome. using t,he c+etnic2l tnethotl of Masam & Gilbert (1980). The srclutw(*iny strategy is diagratntned in Figure 1 and the st’(lut’tt(‘t’ ohtaitletl wib completely overlapped (Fig. I ). ~\E)Eti’o?;irri~itt~l~orwhalf of the sequence was otttnitieti fitr only one st rilrltl. Itut. Lvtierevrr c~onrpressiotis \vt’r(b fi~uncl or wmparisons with the 1-T: sequt’ttw wherever indicated uncertainties in the srquenc~t~. I he second strand was sequenced to resolve thtb utrihiguit irs. Thai amino acid sequence tleduc*rtl for ;\I\Tl~~protcains drmonstrates pronounced homology \vit It t host) of \-F (see helow). which provides c~vicletlc*t~ fitr t hts c*orrect nesx of the sequence ohtainrsc\. l+cxttsc~ ntost of the sequenc~e was obtained for ottiy otte clottt (Fig. I ), however, the sequence ohtainr~tl may trot rY)ultl cwritaiti represent t tie population avcragt 1 iitltl oiw or a fell clhanges from tltrl it~c~t~+~t~. t111e to c~lotiing of minor \aria,nt,s. Kec-auw of’ the cIIX,4 calorting ntrt hod used. the .S’-terminal do~re (211 /I??) was missing some of the hIVE RSA4 sequence at the 5’ terminus. In orcler to estimate this length and to extend the 5’ wquenw of 311-E RX;\. the olipodeos~nrtc.leotidt, 5’.‘r-T-T(:-=\-.-\-=\~‘f-(‘-A-A-A-A-(J:-(’-3’ \\-ah 5vtithesisrtl and ttsrcl as a Ittinier for reverse t txtis(.r:if,tic,ti of JIIyE

alw\-r

3. Results (a) Sequunring

stratrgy

to Yl,I\‘E RNA with reverse rI)NA tnatle calf t,hymus transc*riptase using I)Xase-dig&ed DNA primers was cloned into a plasmid derivative of clones caontaining of pRR32L’. and a number inserts of % to ‘i kb were screened hy sequencing the ends of t,he inserts using a restrict,ion site in the potplinker. The amino a.cid sequence of the encoded protein was deduced from the nucleotide sequence and compared t)o the amino acid sequence of t’hc complete translate of \‘F RiXA. using comput.er

Figure 1. Sequencing strategy of HVE vI)N;Z clones 2/1;22 (continuous lines) and l/l/l% (broken lines). The arrows indicate positions. lengths (drawn to scale) and direction (3’ to 5’) of sequences determined from endlabeled restriction fragments. as described in Materials and Methods. The long open reading frame is an open box. and the positions of the MVE proteins homologous to those of TF are labeled (see the t,ext). The alCernative nomenclature for the non-structural proteins (Rice st al., to t,he previous 1985) is used and corresponds et al.. 1980) as follows: nomenclature (Westaway prM = GP19; KS1 = gp44 or W-3: 1vS3 = P71 or ?;V4. n&a and ns2b are hypothetical non-structural proteins and are discussed in the text.

gc~tlornic~ ftSA4, The c*I)SA synthesizrtl \vas 97 to 9% nucleotides in length and apparently ext,ended to the 5’ end of the RXA (Fig. 2). This rIIX.4 fragment was sequenced to obtain the 5’-terminal scquen~ of MVE REVA; the last four nucleotidex of this sequence corresponding to t’he first three ot four nuvleot.ides of MVE RNA could not, be read wit.h accuracy. although the nucleotides c~ould be (*ounted. The longest, primer extension product may represent a partial copying of the genomic RNA (Lap 1)~ reverse transcriptase (Gupta 8 Kingsbury. l!W). The sequence obtained is believed to represent t,hr 5’.terminal sequence of JIVE RSA sittctb the primer-ext.ended products terrninat,etl in t\vo sharply defined bands (Fig. ‘2) and becausfx of homologies between this nucleotide sequence and (
(a)

c T

untranslat~etl region is 96 or 97 nuclrotide9 before beginning the long open teatlin~ frame (sctl below).

The translated sequett~ of’ the tirst 5436 nucleotidrs of M\‘l? is shown in Figure 1. This represents approximately one half of thcl cotnplrt~r sequence of t,he genomic RSA. and contains the cvit.ire region rnvoding t,hr structural proteins as well as the non-st,ructural proteins SPI (formerly callrd NV3). t&a. n&b, and part of NS3 (formerl\~ coalled Sl;l: the t,erminolog~ is that of Rice c’f (11.. 1985). An open rea,ding frame analysis of this Srquen’Y is shown in Figure 5. As was pr(~viortsl~ found for YE‘ (Rive 4 01.. 1985). t.hrre is one ionp open reading fratnr that begins with att AI-(: c401t at nucleotides !$X t,o 100 and c~ontinurs to 1he vnd of’ the sequence obtained. The MVE 5’ utttranslatjed region of !I6 to 97 nucak&idrs cvmpares with 118 nucleotides in YF (Ricqtx Pt /II., 1085).

(b)

G 6,

Figure 2. Mapping and xequen~ of’ the 5’ krminus of’ I\IV E RPI;A. An end-Ia.helrti synt.het.ic, oliUot’t”c’lcot,it~~, complementary to the 5’.terminal region of MVE RNA was used to prime rL)K’A synthesis on MVE grnomic KNU’A. (a) Thp primer extended products (lane 1). with sequence ladders as markers (lanes 2 and 3). (b) The longest primer extension products (shown by an arrow in (a)) were isolat)ed and sequenced using the base-specific c-hemical cleavage tnethod. A shorter exposure is shown in (c). in which 2 intense bands at the top of the sequence ladder are indicated by arrows (see t)he text for discussion).

MVE

MVE

RNA Sequence

5’[N]NNNACGUUCAUCUGCGUGAGCUU

WN

5’

YF

5’

AG”~-~~~~GC&%J~~~~~ *** * ******i* AGUAAAC---CCUGUGUGCUAAU

Figure 3. Flavivirus

5’.terminal sequences. The M\‘lt: 5’ sequence as determined in Fig. 2 is aligned with the ii’-t,erminal RKVA sequence of WI% genomic RK‘A (Wengler & \I’engler. 1981) and t,he YF 5’.t~erminal RX4 sequence. as deduced from sequence analysis of cloned cl)KA (R,ice et trl.. 1985). The length of t.he M\‘E sequencae that has not been determined is probably :S or 4 nucleotides (see t)he text,) and those ambiguous posit,ions are denoted with IY. Gaps have been introduced to maximize homology (indicated by dashes). and positions which are identical between sequences are marked b> asterisks. Sequence hyphens have been omitt,ed here. and in other Figures. for clarity.

The start) points of the various MVE proteins have ht>en assigned from the homology between the M\‘K polyprotein and that’ of YF (Rice et al., 1985) or, in the case of prM, homology with WN (Cast’le rt ~1.. 1985). A direct comparison of the amino acid sequen<*es of Mi’F: and YF is shown in Figure 6. Caps Ilavr been introduced in several places t,o maintain the alignment. Different virus proteins show different, levels of homology, and even within a protein there are conserved domains and domains that show litt I? conservation (Fig. 6). Overall. the two viruses k L r.xhibit, 1%“’ amino acid sequence hOHi(Jhgy over t,he region tequenced (counting gaps as misnlat.(~hes) and thus are closely related and in all probability have descended from a common anc+rstor. Ii is of int,erest’ that the structural protein qion. inc4uding XSI, contains highly conserved cysteinc~ residues (Fig. 6). whereas the remainder of the non-strucatural proteins so far analpzed does not show suc*h marked conservation of cysteines. Thea organizat,ion of the MVE genome is the same as that’ previousI? found for YF (R,iw et al.. 1985). The struct nral proteins and their precursors are fi)unti iit the 5’ end of the genome and the nonstrucstural proteins are encoded in the 3’-terminal t I-It-et&-tilurt hs of’t he genome. The structural proteins art’ twc~cdrti in the order 5-C‘-prM(M)-E-3’, where (’ is t h(l c3picl prot’ein. prSI is a glycosylated prcc+ursor t 0 the membrane protein M (see below). and E is t lie envelope protein (prM was previousI> r&rreti to as (:I’l!l or SV2: see Rice it (11.. 1985). The IIO~I-st ruc~turnl proteins follow E. To illustrate the fact that some domains between 111’E and YP proteins are highly conserved and others are not. t)he amino acid sequence homology is plotted in Figure ‘i(a) as a moving average with a string kLngt,h of 20 amino acids. With this window. the sequence homology varies from virtually OqjOto almost I OO”,,. depending upon t,he domain. These homologies are discussed in more detail below. where the func,tions of the individual proteins are cbonsiderrd. ;\lso in Figure 7. a hydrophobicity plot of M\‘li:

313

proteins (c) is aligned with one for YF proteins (b). Several points emerge from an examination of the sequence homology in comparison with the hydrophobiciby analyses. First, the hvdrophobicit) profiles of the two virus polygroteins are very similar. even in regions where amino acid sequence is not conserved. This is particularly st)riking in the region of the genome encoding potent,iat nonstructural proteins n&a and n&h (see below) where the hydrophobicity profiles are superimposable, even though the amino acid sequence is not particularly conserved. Second. there are hydrophobic domains that precede the ?; terrnini of prM. of E, and of NSI (NV3) that could serve as internal signal sequences for insertion of the pl>,caoprotrins into the endoplasmic reticulum and as (‘-t’erminal membrane-spanning domains to anchor the prot,eins in the membrane (discussed in more detail below). There is also a hydrophobic domain prec(Ading ns%a that could act as a C-terminal anchor for NSl (and conceivably as a signal sequencaefor n&a).

\\‘e have proposed t)hat flavirirus RNA are translated t,o produce a single large polyprot,ein that, is cleared post-translationally to produce the rilri0Lm flaririral proteins and suggest,ed a nomenclature for naming flavivirus proteins based on gene order (Rice et al.. 1985). The start points of several of the 1-F proteins hare been positioned by S-terminal sequence analysis. This allows the ident,ification of a consistent set of pot,ential cleavage sites in the YF pal\-protein that could explain the generation of YF struct,ural and nonstructural proteins. From the homology of the M\‘E polyprotein with tIllat of YF we have identified corresponding sites in JIVE that c*ould be involved in similar cleavages. These are shown irt Figures 1 and 6. and thr homologies arountl these potential cleavage sites for a number of flaviviruses are shown in Table 1. The JI\I? polyprotein begins with the sequence of the capsid protein. From protein sclciuencaedata for YF it was found that the initiating methionine for the pol\protein is removed. sincse the capsid protein begins with the serine that follows the initiating methionine (Rell et (11.. 19X5). \l’e assume that this mrthionine is removed iI1 the (‘ase of a,11 flaviviruses (see also Boege it (I/.. 1983: Table 1). The remova,l of this methionine ma?- be catalyzed !J>;a crllular methionine aminopept,idase. since the Initiating methionine is removed from iI ~~urnbrr of c~ellular proteins as well. \Is*\‘ealso note that there are conserved mcthionines at posit’ion 15 (J1\7E:) or 1-I

(1-F) that are found in more favorable initiation (*oIltests (Rozak. 1984) and that c~ultl csonceivabl>be used as alternative initiat’ion sites. The capsid protein in IT-F virions begins only wit)h t hr serine at position 2 (Bell ft nl.. 198.5). but altjernativr forms of the capsid protein ha,ve been found in tiarivirusinfected cells (n’estaway et nl.. 19’i7: Il’engler et (21.. 1979: Svitkin rt nl., 1981. 1981). 4 similar

M VE RNA Sequence MVE RNA

Figure 5. Open reading frame analysis. Graphic representations of the distribution of termination c~dons (vertical lines) in all possible reading frames and the first methionine codon in rach open reading frame (asterisks) are shown for the svquc~nc; in Fig. 4 and‘its complement.

hypothesis ha.s heen proposed by Castle it nl. (1985). Proteins pr31. the precursor to the Al protein which is found in the mature virion, E and X31 are all gl-ycoproteins that. follow one another in the polyprotein sequence and we hare proposed that they are each int’egral membrane proteins that are inserted one after the other into the endoplasmic reticulum during synthesis (Rice et nl., 1986). The primary sites of cleavage to separat,e the capsid prot,ein and the plycoproteins from one anot her. have hern located from the X-terminal amino acid sequence of’ YF IX (Bell it nl., 1985) and SSI (J. Pata. ,I. Schlesinger. R. Arhersold. I). Teplow KC’. )I. Rice. unpublished results) and of WN prM ((last tr vf nl.. 1985). Cleavage occurs after short side-chain amino at-ids and could be ac~complished by t,he cbrtlular enzyme(s) termed signalase. in thta lumen of the endopIasmic retic~ulum. that func%ions signal sequences. This hypothesis is to remove supported by results from translation i,/ vifro of flavivirus RXA. in which Svitkin et al. (1984) found tha.t mtbmhrarxs were required to ~~~~w~w t hta struc+ural proteirls of t,ick-borne encephalitis virus. f n the ahsenc.e of membranes. no processing oc*curretl. which is the expected result if signalase performed t hc, cstravages. Thus the hydrophobic, regions found just’ I)efore the S terminus of prM. 15 and SSI (Fig. 7) could function as internal signal srquenc+cis to insert the following protein into the memhranr and as (‘-terminal hydrophobic hrlt,s to anchor C‘. prJI and l-2 in the mt~mbranr. There is also a hydrophobic* region preceding ns%a that c~ultl an&or KS1 in the membrane and/or Ivad to t hv insertion of’ ns2a into the membrane. f+c~ausr thtl (’ termini of t hrsc proteins have not been rnapixd. it is not k~~o~vn at llresent whether C’. prJ1, 13. or SSI extend to the S tt-lrniini of the following prot,eins illdi(‘illt’(l in ttrts geriornic~ fTl&p 01’ if t hcrc> iLl’(

peptides exc%ed from between any two protrills. Such rxc’ision occurs with the .W, CiOOO protein of alphaviruses. to&ed between ylycoproteins Ii:% and El. which functions as an internal signal secluenve and which is probably removed by signalase (Kite 19X0: \\‘t~it+ K& St,rauss. 19810: Garoff f+ rrl.. Sefton. 1979). The cleavage of prM to produce JI apparent I? occurs during virus maturation (Shapiro PI 01.. 19i2: Russell it 01.. 1980). and no 31 has hem ticbtected in infevted ~11s (LYcstaa-ay. 19X0). ‘T’tw CleaWye occurs after the sequence -~rp-Set.-~~I.~?.-.~i,~ for \. F and also presumably for hIYE. This t!‘f)tJ of Iate cleavage evt~llt after the (2mo~li(-al styueli(v =\rg-S-.Argj I,ys-Xrg (1)algarno rf 01.. 19X3) o(a(-urs in the case of many virus glyc~oprott~ins antI ina>. I)e catalyzed by a cellular protease associatt~tl \vit h the iit’t’ $1 I’RIISS (Mgi apparatus (for a revie\v. k Strauss. 1985). Finall,v. it appears that 1hts iioil-~ti~ric~tl~r.;tl proteins of flaviviruses other t,han KS1 may be clenved hy a virus-encoded protease that calea\.t.s after double basic* residues that art’ ~)re(.t+tl and or followed hy short side-chain amino ac*itls. often glycine or serine (Rice et (~1.. 1985). For YF. cleavage oc(ars after two Arg residues in ca;tc.h (-asv (Rice r~f (11.. 19X5). As svt’n in Table 1 thv two i)utativt sites identified for J\‘l-? are hot h I.ys-.Arg follo\vetl I)!, (:I>-.

(P)

7’hf,

.1-I

I-E

rnpsitl

/“‘“~‘i~/

The MVE caapsid protein is quitv basic ii1 character; So,, of the residues arr f.>.s or Avg. Thv favivirus c2,psid protein enwraps t hv RN;-\ in a nuc+oca,psid having icosahedral sym met r>-. The function of t,hese basic residues must be. at least in part to neutralize negative cha,rges on thcb I-‘,,?l’.Aaild ---

Figure 4. Suclrotidr seyurncae of the .i’ region of MVE genomic KXAA. Sucleotides are numbered from thv bc~innitrg. amino acids are numbered from the first methionine in the long open reading frame. The putative start points of thr encoded MVE proteins are labeled; with uncertain K termini marked by broken arrows. The nomenclat,urr is described in tlte ltypttcl t,o Fig. 1 and the text. Possible membrane-associated and of NSI are o\-rriinrd. and potential S-linkrd glyrosylation

srgmrnts of’ the structural proteins sites arp denoted by ast,erisks.

or their

~~rwrrrsors

possibly to facilitate proteitt--RN=\ intc~rac~tiotrs. I’tilikr thr situation for alphaviruses. which also have ii nuclrocapxid with ic~osahedral s~tttm(~tr>~. and in which the basic. residues arc for the most [lill’t c*lustered in the N-terminal third of t hc c.apsitl f’rotrin (I)algartto rt nl.. 19X3). the basic. rt3itlut3 iti the favivirus capsid protcGn iire fi,r t hta most part ctistributvd over t tiv cntirr length of’ 11113protc~iri (Fig, 4). The (’ tvrtninus of MYE C’. assuming it vstrnds to the heginning of prJl_ is uncharged an(l the srquenw hot~rology \\-it Ii \‘F is vtyy lov.. c~otisistent with its probable fullc~tiotl as a siptal squetrc~ t’o insert prM int)o the endoplnsmic reticulum. Thr overall homologbetween the M\‘E and 1’F c,;tpsid proteins is low. only 27”,, counting gaps as tnismatjc~hrs (Figs 4 and 6). The most highly conserved domain is between residues 42 and 59 of M\‘E (‘. a region containing only t,wo charged twitiues in MVE. This domain could be involved in prot~einq)rotein int*eractions or in specific RNA-f’rotein interactions required for the assembly of

nuc~leoc*apsid and the nquisitioti lipoprotein envelope by the c~apsitl.

t hr.

ot’

t t1rs

The prJ1 proteins of M\yE and YF also shov fairly Ion- homology. X3”,, over the entire prot.titi in t,hc )I region (Fig. 6). Thv homolog,r!~ is a11tl 36” ’ a strikingly ttorr-uniform (Fig. 7). with some domains demonst;.atitig No,, or great,er homology and others The (‘~terminal domain ot IO’!,, or less homology. prM (assuming that. prnI vst,ends to the heginning of E) is hydrophobic in charact,rr (Fig. 7) and. iIt the (‘-terminal 37 residues. ant! one is charged (a Ivsinr in M\‘E tha,t corresponds IO an arginine in \;F). This region may funct,ion bot,h as a tnembrant~~ spanning domain and also as an int,ernal signal sequence for insertion of E. However. since this region shows rather higher sequence conservation between MVE and YF than would be expected if its only functions were as a transmembrane anchor a,nd signa, sequence. it may have ot,hrr functions as

200

400

6bO

1200

1000

1400

--I~----_1600

1600

Figure 7. Horr~olog,v and h~drophohicity analyses. (a) A moving pewetrtage homologplot of the MVE antl Y F srqurnces as aligned in Fig. 6. rrsirrg a \vindou of 20 amino acaid residues. (b) and (c) Hydrophohicit~~ analyses of YF and MVE. respectively. using the program of Kyte & IIoolittle ( 1082) and a searx~h length of 7 r&dues. Incwasinply hydrophobic regions are shown ahove the midline (an average hydropathy valrw) and are shatiwl. Putative hydrophobic. menrht~arrr-assoc,iat.rd domains of the structural proteins and SSI are irrclicated hs filled bars: potential S-linked gl! (YJSJlatiotl sltw a,r’e denot.rti by wterisks. and H hrrr c*orrsrr\-rci in l)osition hrtwren 1‘): and M\‘E arr also showrr III (a). ‘I’hr nomenclature for the j~r~~tc~itls ib d(w~r~ilw~(i iti the Ityerld to Fig I and the ttl\t

0

1200

1000

Table 1 N-feminal c.lefxuayesites

(’

MVE

11 Al I I

YF SLE TKK wx

M

[“.I1

MVE YF \z’K

11

>IVK YF ,Sl.# b7i

S KRS RR 9 RRS RR

E

m-E \‘F SLE IlKSv% ws

VAPAYS VGP.4 YS

NISI

MYE Y 1’

(Ibla)

MVE YF

(,ls!l,)

MVPZ YF

SS3

zvlVl3 YF

I FMI,I (:FAAA I I, 0 MT, L MT G G 1, I, G I, I A (’ a G A

I’ A I’ A Y S LATSVHA 1,s I>GVGA

LKLSTFQGKI YMT V T 1. V R K N R WL I, I, S VTLSNFQGKYMMT

F i% (’ I, G M 1; S R D F I A H(’ I GI TDRDFI FNCLGTS NRDFV f,‘IGISNRDFV F DT (2 (’ A I DQGCAI

f

I, KYTKR V RG A RR

L

S 1 TVQTHGESTLI-EK AI DLPTHENHGLKTR ! I SVQHHGD? LAPKN SLTVQTHGESTL.4SK

S RRS RR

(‘IGPNKKR TI11 V’(:RR

S KKPGG S (;RKAQ ) KKPGK VKKAI SKKPGG

I

(: WP A 1‘ C s I PVE;tc G (: V F WI) S (:DVLWD

I’utativr N-terminal flavivirus protein clra,vapr sitrs (fw Ihwussion. secl the text). Amino acid residues determined I~?S -twminal srquenct~ analysis are shown in boldface type. I’l.~~trin srquenw data for the structural protrim (C’, M. E) of 1’1;. drngur 2 (DEN2) and SLE are from Belt et al. (1985). From src~wnw homologies with the other flaviviruses it seems clear that the sequence reported in this paper for prM (SVZ) of SLE

R&S instead to M as shown here, and those for WS are from (‘astIr rt rcl. (1985). The tick-borne encephalitis virus (TBE) (’ srquem~ is from Boege et ~2. (1983). The YF non-structural S-terminal protein sequences are unpublished result,s of ,J. Pata. .I Schlesinger, R. Aebersold. 1). Teplow and (1. M. Rice. The remaining sequences were deduced solely from the YF (Riw PI r/l.. 1985). M\‘E (Fig. 4) or U’S (Castle et al.. 1985) RNA wqurnces. Parentheses for ns2a and n&b indicate that these clravage sitrs have not been confirmed but, are based on homolog,~ with confirmed cleavage sites and the sizes of yellow l&wspecific polypeptides observed in infected cells (Schlesinger I,/ ~rl.. 1983: .J. Pata & C $1. Rice, unpublished resuhs).

w(4I. for example prolrin and/or

to interact specifictally with the nucleocapsid during

t’he E virus

iISSVnlf)l?;.

There is a domain in pr’liI of about 30 amino acids that is highly homologous between MVE and Y-F. It ~JrWfYkS the N terminus of 31 and includes the first few residues of M. It contains fief> c73nserved c’vs wsitiuw. two of which are in a domain of six c~onservt~l amino acids that includes the sequence (‘ys-Try. Since this sequence is often found at the avt ivv site of thiol proteases (Takio rt nl., 1983). fJr;\;I may function as a prot,ease to cleave t.he rapsid f)rotein from t,he nascent polyprot,ein and/o1 ~~onc~rivably function t.0 cleave pr&f itself to M. f’rotein prhl is a glycoprotein (Shapiro rt nl.. f!Kr’: \I:rstawav. 1975: for a review, see R#ussell et ctl.. f!~XO: K’estaway. 1980). and M\‘E prM has one t,ossil)le at.ta.c*hment sit;e of the form Asn-X-Thr!Sel.

f’or an S-linked oligosa~c~haridr, Asnl,‘,. ‘l’hls ASI residue is found in a region of prNI that is fx)orl?conserved hetween MVE and l-F. f)nt the carbohydrate attachment site it,self’ is caonsrrvetl (Fig. 6). (YF prM has two additional sites for possible carbohydrate addition t,hat are not f)resent in MIVE.) The W\‘E protein 11 is derived f’rom the (’ terminus ot’ prM and is quite short, 75 residues if’ it extends to the &art of E. consistent with its estimat.ed molecular weight. from acrylamitle gels. of’ alJont X000. As discussed above. the Cterminal domain is hydrophobic and hl is probably a11 integral membrane protein. having one or more transmembrane segments. (lonsistent with the t’avt MYE M has no that >I is not glycosylated, at twchrnent sites for TV-linked oligosaccharides.

The IVYI’, E protein is SO1 residues in length if it cstrnds to the Aart, of SSf, wnsist)ent with mofcc~lar weight estimates of about 60,000 for flavivirus 14: proteins (I$‘est,away, 1975: R,ussrll uf ~1.. 1980. \!‘rstaway. 1980; Heinz $ Kunz. 1982). It has a single site for glycosylat.ion of the fornr r-\slv?i-‘l’hy&er. at, Asnf 53. 3s was the case wit.h f,rN. t,his glycosvlation site is in a region of low homology betwe&1 MVE and I’F. Interestingly. the (\‘F has ;I glycosyvlatiotr site itself I.‘s ttot cvnservrd possible ylycvsylation site at Xsn309). In view ot this lac~k of’ conservation. it is of‘ interest that flavivirus I< prot,eins ha\c heen found in hot h ylycosylated and Ilon-gl!,(~os~latccf forms (byright VI 01.. 1981: \l’right. 1981: Schlesinger rt (II., 1!)83). cell-associated fhrms of’ 11: ilt’(’ Yirion and glyosylat rd in JI\‘E grown itI \‘fwJ c*c,lls (11 estaway. 1975; LVright. 1982). Xt t,he (’ t,erminus of’ E (assuming it vst.t~nds to the S terminus of r\;Sf ) there arc’ t\vo hydrophof)ic~ domains punctuatjrd by hasit* amino acvis (Figs 6 and 7). -4s postulated f’or prhl t,his domaitl presuniabi?~ filnc+ions to anchor I<: in the lipiti bilayer atld. a.s a signal srquenv?. to t)l,otrlotC~ thv t.ranslowtion of’ NS1 across thr c-wloplasmiv reticulum. In contrast t 0 prM. t tie (‘-tcrminai domain of MYE T’ sho\vs vc’ry little sc’cluc’n(Y’ conservation with that of’ ITF (Figs 6 alltl 7). \vhicsh wit h its f’nnc%ion as iI simple is cwnsistrnt tl,aflsrllr,ltl~)r.atl~, segment or signal srqwttcv. (‘omf)itrittg M\~fC and 1’17. regions of E shon rrmsrkal)lt~ amino witI srqn~~nct~ c,oris~~r\-~rt’iorl while others show \-iri ually votnplete cfivrrgenvtx (Figs ti and 7). For t~samplr. residues 92 to 123 of IlVE 15 with those of YF. whereas are 7!1”,, homologous residuc~s 126 to 1X3 are onfv 1Y,, homologous with t.hr c~ortvsponding region &’ YF (c:ount.ing gaps as misrnaic*hcs). The 15 protein is ihv virus hemagglutinin and possesses t hr major antigenic* it is tempting to speculate c~pitofws of’ flnviviruses. that variously vonservetf domains will caorrelate with group. type ant1 strain-spevificx epitof)es assocaiatt*tl with fla.vivirns E t)roteirrs (Hvinz cf ~1..

1982, 1983; Gentry et al., Henchal et al., 1982; P&is rt tzl., 1982: Schlesinger et al., 1983. 1984). The overall amino acid sequence homology between MVE E and YF E is 442/h, a perhaps surprisingly low value in view of the close serological relationship found among all mosquito-borne flaviviruses (see Discussion). The features of M and E discussed above predict that, most, of E wilt be external to the viral membrane and that about half of M will be similarly exposed. Digestion of flavivirus virions with proteases confirms this prediction (Heinz & Kunz. 1979; F. Heinz. personal communication: .I. Pat,a B (‘. Rice. unpublished results). (II) The

N I’/!? non-structural

proteins

The first of the non-structural proteins, X81. is a glycoprotein (Shapiro et al.. 1973; Westaway. 1975: Schlesinger d (12.. 1983). NSl is the soluble complement-fixing antigen (SCF) for dengue virus t.ype 2 (Smith & LVright, 1985) and probably also for YF (R~ussettrt ~1.. 1980: Schlesinger et al.. 1983). At least some SSl is present at t,he plasma membrane. since monoctonat antibodies to \‘F XSl are csapabte of mediating complement-dependent lysis of’ infected 41s (,J. Schlesinger. persona.1 c,onlrnuni~ation). and antiserum specific for dengue 2 SCF shows some cell-surface staining in immuno~?-toc,hernicat &dies (Cardiff & Lund, 1976). The function of NSl is unclear. but it ma\- be involved in virus assembly rather than RSA replication. The N terminus of YF SSl has been established bv amino acid sequencing (J. Scbtesinger. R. Aebersold & (‘. Rice. unpublished results: Rice d n,Z., 1985) and from the homology of the YF and MITE polyproteins the c~orresponding start of )lVE NSl can be predicted. The (’ terminus of’SS1 has not been established bv sequencing. but from molecular weight, estimates (df \-F NSl) NSl is about 520 atnino acids long, The (’ terminus may be in a hydrophobic region of fairly low homology between MVE and YF (Fig. 7). caonslstent with I he hypothesis that NSI is an integral membrane glycoprotein anchored in the membrane near its C terminus. Cleavage at the C’ terminus of XSl to separate it frotn t,he next nonstructural proteitr ma\ be catalyzed by csetlulat signatsse. The membrane-associated and soluble fijrtns of SKI could caonceivabty differ by t,he presence or absencae of this put’ative C’-termina,l hydrophobic segment: however. no direct evidence exists to support this hypothesis. XI\‘E NSl is moderately hydrophilic (particularly i ti (30tn parison with ns%a and ns2b) and shows regions of homology with YF which range from < I W,) to > AU”,, (Fig. 7). Tt contains three possible glycosylation sites of the type Asn-X-Thr/Ser. at As11130. .4st1I75 atld Asn207. The first a.nd third sites are ctrnsrr\-et1 in YF (Fig. 6). lt is not known if all t hrer positions are in facat glycosytat,ed in M\-E. but the caonservat ion of t.wo of the sites between two different flaviviruses suggests they are important for fiitic*t ion.

The next non-structural proteins, ns2a and ns2b, have not, been est’ablished rigorously as being produced in flavivirus-infected cells and no amino acid sequence data exist to establish t’heir start points. It is for this reason that we refer to them in tower-case letters to indicate this uncertainty it1 assignment. This region of the polyprotein is poorI: conserved between MVE and YF with. overall. 27(),, and 35O, amino acid sequence homology in the ns2a region and ns2b regions. respectively. Both nsda and n&b are hydrophobic in nature (Fig. 7). It is of the hpdrophobicity pattern is interest that conserved between t,he two viruses even t bough amino acid sequence homoto!+y, is limited. Thea function of this region in flavlvlrus repticsaticjn is unknown. Finally. we have obtained the sequence of almost half of SlyI? NS3, a prot’ein of about iO.000 ;Vr, NS3 is fairly hydrophilic and for YF has a net positive charge. It is more highI!- conserved betwrrn M\‘E and 1-F than the other 1)roteins sequenced (Fig. ‘i). the overall amino acid srquenc~e homology being 55”, over the region sequenced. \Vithin the region sequenced there arr highI> c~onserved domains, as for example residurs 1 19 to 151 of XI\%. in which only four mismat &es occur with the YF sequence (88O, homology). ?;S3 is atmosf certainly involved in RNA replication.

Codon usage in flaviviruses is nol random (Table 2). Sate that t’he Leu codon. (‘I’( ;. is used much more oft’en than CI‘A and that l’l’;\ is seldom used. The GI‘G codon for \-al is the most frequent)ty used and Gl’A is infrequently used. There is also infrequent use of codons c~ontaining the (‘-G doublet. hIVEt) and lT-F 410~. slightI\different use frequencies. but the same biases in codon use. For the genomes of hot h A4\‘E and YF the dinucleotide frequency is non-random (Table 3). Tn part,icular note the low occurrence of the (‘-(; doublet. 4. Discussion This paper and that’ on the sequence of’ the YF genotne by Rice et al. (1985) contain the first detailed information on the organization of the flavivirus genome and t)he &etationship at the sequence level between flariviruses belonging t,o two serological subgroups of the mosquit o-borntl flaviriruses. \T’e had expected the flaviviruses to demonstrate considerable amino a(%i styuenc~t~ homology because they form a tight serotopic~al group (Porterfield. 1980). The homology bc+ween M\‘E and YF is 429, over the region of’ >I\-E sequenced. For this reason >IVE sequences c*ould be unambiguously sligned with those of 1-F. and JlVE clones for sequencing could be selec+ed rapidly. Howl-ever. the ovrratt homology was tes> than

expected. Our data show that the ext,ensire serological cross-reart,ions between flaviviruses exist despite extensive divergence on the part of the viruses. and suggest tha.t. certa.in highly c~~~tservetl cyitopes encoded in the flavivirus genome fulfil an important, biological role. It is interesting that. although alphavirus glycoproteins exhibit an overall sequence homolog! similar to that shown by the E proteins. this sequence homology is more uniforml? distributed over t,he prot,eins than for MVE and \1 P (l~alparno rt nl., 1983: Strauss B Strauss. 19%). Thereforr. it is of’ note that serological cross-reactions beta-eett alphaviruses are more limited t,han those observed between flarivirnses.

proteins, is a gljw~proteitt that c7)ultl futtc,tiott itt virus assembly rather than RSX replication, and therefore tnight, have a st~ruc%ural or tnot,f’ttoRertrti~, function rather t)han ~II enz?:matic one. It1 this contest the “st,ructural region WOUl(l occ~up\- otIt’third of’ the yrnorne. ii tnore typical fignrc for animal viruses. The t,hree glycoproteins encoded by MV K and 1.14’ are loc&ed next to otw another in thr p~t~ortte attcl probably have t’hc con\-rtttional orientat ion ot’ N t.errrlittus

out.si&

attd

ii (‘-i~~tTlitlill

t ~i~t~stt~~t~lL~r;itlt~

anchor. It is somewhat surprising that 13. bvhicah fulfils the same biologic.al roles as t,he (external) pl~(~o~)t,oteitt(s) of’ other envrlopetl viruses. c*otttaitrs a glycwsylatiott site that is ii01 c~otiserv~~tl ;I> t 0 positiotr ‘bet~wwtt Y F wrtcl \IVE. Iloth glyc*os~latc~cl and ~~o~i-fil~~o~~~atttcl forms of 15 o(~.ut’ itt sonw flavivirttses (\\ rtpht 01 //I.. I9XI: \Vright. l!)XL’: suggwt ittg Schlrsingrr of I 9x3). 1hat (Il.. ylycwsylatiott is not c7wcial for the futrc4ion of thcb protein. t,steittc* twidwh itt thy ‘h struc~tural region of’ thv gettontr~ anti itI SS I toyet,her ivit h the lack of cwtservatiott ~II Utcb ttottstruc+trtA rtyicktr. is tvtrti~risc~t~nt of 1hts bit uwt iott foutltl 1’01, r)tttPr RX;-\ \~irtisw. itlc*ltltlitt,g illf)lli\viruses. Alpttaviruseh attcl flitvivit,ttsvh appear to Iis(’ hontlitig to stabilizci t Ittl ext~ensive disulfidr cwnforniation of their strttc4urwl c~nvelopt~ I)\rt (I0 httds its vstvttsivt*lJfor tlt(b Itot (1st suvtt enz~matic~allv acativcs proteirih. l’rt~surtti~t~l~~ this reflects it dikerenw it) t Itr st.abilit,y wcIuirecI for proteins that itttrrac? \\-ith the external (wvirottagainst s~txttt merit. for example for put~wtiotr prokases. r’is d nis prokins active in the rrlativrl? constant environment of the cell vytosol. Although NSI is not found in the final viriott st rttc+urr. thra

321 conservation of cysteine residues between MVE and YF is consist’ent with some type of structural role. perhaps in virus morphogenesis.

YF RSA has an open reading frame of 10.233 nucleotides (Rice rt al., 1985). MVE also appears t)o possess only 011f long open reading frame that encodes proteins homologous to those of YF. It seems unlikely that a long open reading frame would be maintained if a major fraction of an? flavivirus prot,ein were produced by a method other than translation of a long polpprot’ein followed b> post-translational processing. The existence of a consistent set of proteolvtic cleavage sites that could explain the generation of the final produets supports this hypothesis, as does the recent finding of possible polyproteins in cells infected with dengue virus ((2. (‘leaves. personal communication) and tJE (P. S. Eastman, J. 0. hlecham KT (‘. 1). result,s). The hypothesis of Blair. unpublished Westawn (\‘CTest,away, 1973. 1977; Westaway et nl.. 1984). that multiple independent initiation sites are used for t)ranslation of flarivirus prot,eins. must t,herefore be reassessed. \l:e have re-examined the experiment)s on which this hypot’hesis was based in proteins light of the order of the flavivirus established frorn the nucleotide sequence dat,a. The data on runoff synthesis of Kunjin proteins after pactamyein inhibition of continued initia,tion (West)awav, 1977) in fact support the hypothesis t’hat, flavivirus J)rot,eins are translated as one long polyprotein. as do t,ranslation studies in I&XI with both JJ? (Shapiro et al., 1973) and Kunjin virus (Westaway & Shew. 1977) RNAs in the presence of puromyein. The gene order deduced from these st)udies agrees with that found by nuoleotide sequence analysis. Jt seems possible that the remaining dat,a supporting independent translation init)iation sites are artifkct.ual. We suggest, first. that in the experiments in which high salt was used to synchronize initiat,ion of protein synt,hesis, the rapid incorporation of label into all Kunjin proteins upon release from the high salt block could have been dur to preinitiatrd chains that survived the high salt treatment rather than to reinitiation at many points in the genome. Reinitiation may be slow because of the less than ideal contest of the above. Second. initiation fYdOIl (,f discussed int,erpretat.iorr ~ tae ultraviolet mapping experiments that indicated at least two initiation sites (Westaway c,f trl., 1981) is subject to J)ossible error if B viral protease must be translated to process virus precursors and/or if ribosome transit t irne is slow in cwtain regions of the genome ((~rantham rt al.. 1981). WP shall discuss this topic it) grclatrr detail elsewhere. \2’e ha\-tl i)ostulated several different t!-yes of c&vents for processing of the tlarivirus c~leitv&gf~ polyproiein (Rice it al.. 19%). These prfwirsor involve 1 hrrtl cellular fmbases and one or more \-irxs f)rotcJ;tsfss. \I’(> propose that cellular r)roteases

are active in removing the initiat,ing methionine from t’he precursor (an enzyme active in t,he cytosol that may function while t,he protein is nascent), in cleaving prM to M during virus maturation (a Golpi prot’ease t,hat ma,y funct’ion to process glycoproteins of man\: virus groups), and in separating the three glyeoproteins from t,he precursor (sipnalase. act,ive in the lumen of the endoplasmic reCculum). The cleavage sites in each case (Table 1) are consistent with the known activities of t’hese proteases. \Ve further postulat,e that a virus-encoded protease generates the various non-struct~ural proteins (excluding Ml), and cleaves after double basic8 residues (Arg-Arg in YF. Lys-Arg in MVE) surrounded bv short, side-chain amino acids. This viral protease is probably encoded in the notistructural region. and we have previously argued that all virus polyprot,ein cleavages occurring in the eytosol are catalyzed by virus-c,n~odt~d enzymes (Rice 8r Strauss. 19Xla). The Ctermini of the various proteins have not been sequenced and it is not know-n if small peptides are removed. An int’riguing possibility is that the mature capsid protein t*erminates after a c*luster of basic amino acids about 20 residues upstrea,m from the start of prM, in t,hr sequenct Lys-Argl(:ly for MVE: or Arg-ArglSer for YE’ (Fig. 6). Cleavage at the positions denoted by the arrow would remove t,he putative (‘-terminal hydrophobicn domain that could anchor t.he capsid prot)eirr in the membrane. If such a caleavagfl occurred during virus maturation it might permit rapid assembly of particles and could explain the failure to observe preassembled capsids in flavivirus-infected cells. ln this model. if all of thf viral c~omponents could assemble but membrant it from association of capsid protein prerentefl assembling into a capsid structure, elravage would alloa rapid capsid assembly and possi bl?simultaneous enveloJ)ment. In addition. c+avape of prM could he involved in assembly also. as wuld interactions with MI. Jt is of considerable interest that this possible cleavage site has Ihfb same recognition sequence as that, for the non-st rueturai protease for both M\‘E and YF. although as not’ed earlier it is conceivable that, either prl\I or Ml might possess proteolytir activity.

The amino acid sequence homology between SIYE and YF proteins is evidence t,hat flaviviruses have descended from a common anc*cstor. The flaviriruses have diverged to the extent that c*odon usage for conserved amino acids ha:, been essentially randomized. The (~onst~rvation of amino acid sequen~~estherefore reflects a strong sr~lcc+iorl for of certain retention funcations. Sirnilal~ observations have been made for oth(ar phls~ strandrd RSA viruses. This vodon I,~Lntloltlizatioil can hf. seen by comparing amino acitts 3X-Cto 11 3 (numbered from the polyprotein seclnence. Fig. 4) of 15;~ntl amino ac.ids 1575 to 1712 of’ NS3 of JlVI+:

J-P’.

and acids

In

thrse

of’ which

stretches

113

are

(6i”,,

homology

cwulting

these

f~onservrd

amino

the

sitme

fdetl

cot-lo11

fhr

hy

Statistical

c~hatice f~d01i

would

alone.

41

into

pretlif~t

that.

f~,dons

woultl

whicli

l!W:

Holhti

Y(

against siletit suc*h ptwsut’~ t’ottttfl

that

RN24

sequettw

average KS&-!

Sf’f{tICtlfY.

Alait\.

Thus

the

tnust

have

overall

ot’

J)fxriod

I Ilfs

Iti,&

(RettItle>..

Selwt

iott

J,t’fw~Irf’

po1~a1~1~ I)otnittgc)

lo\v. alt 1totigh it rrl. ( 1978)

Qfl

twttsistrtl

KS;\ thnt

III‘

~~OSWSSt’~l

Ot’ t Ilf’S?

it ill1

individual f’rottt Iltis

f’h?ltlyf?S

tIlllh1

ItilTf’

hiti t hv areragSr seqtwtlf~f~ httfl Over’ ;rtt\. variant t~XwtllitltYl. stwmtlary

been

f~stf~ttt

iltrfl

tllOlfY~lllW

been silent vhattgrs. scalectivr ;IcivatItagc~

itI ot

the

hut itt 1vhic.h i1ac.h il meati of 1.7 c*hanycv

fwtltaitted

t1\-Prti~e

‘I’h

virrtsw

1982).

I)ikis basis

thtl

of~cwtwti

c~hanp is does exist. Of

virusw

reihts

ll;lc.tf~riol)lIa~t,

J~opulwtiott

on

I)>

6.5 i\t’tL

two

irfw~utlt

1iS.A

trl..

fbr

whereas

tnatfall.

has it,

fiwJitt~tif~if~s

01’

drrl

t ht.

ptwutnahly

e\olutiotl

mutatiott

antitto J-F

ilttcl

tnistnatf~hr~s).

in

tahtg

Jl\*li:

are

viruses

fdotts

ratidomizatiotl

ovf’r

its 4X

I)oth

in

tliflwrnt

usage

in

gaps

169

UP

same

ads.

atlwlysis

(7)flott

thcw

the

str’ucd

optitnal

fill*

ttt’t’

of’

replication.

t ttr

;t

KS.4

tt’HtlSlilt

ion

atitl rncapsidatioti iutfler thfx cwntlitiotis usrti. ‘I’hta ittf’requent oc(*utwtt(‘e of’ t hc (‘-(i dittu(*lwtidv itt fia\-ivitwws is ~hat.Rc~t~~t,istic, of’ trti~mtiiit~iitii atttl il\~iflIl

(Russell

1)SA

r>f (Il..

viruses that replicate of‘ itrsrc4 1)X;\ or alphaviruses

that.

l,rtwrrn significanw

ot‘

KKA

viruses

ocmir

in

and.

given

it seems hosts

in

the

whicoh It

thrir do

not

rate

to

be an atI

(‘4:

mode

in

is

of change

in

daptatiott

may

say

hut

KKA

iti

viruws. It

to

growth

itt

this

faasf‘

to

groups

t)heir

\vIt!.

differ

:tfwut

of t.hese two

of’ adaptstion

not

c~stt~ttt.

rernttmt.

should

something

at

in

t~voltttiottat~y

r&w.

f,rrtaitl

of’ (’ tlws limited

alphaviruses

understand

t’ t’rrquett~y

is very

anti

histories

t~volutionarg we

high

:Lt1d Of

l)ut not itrc~luditt,g ;~ltertratc~ hostl-. The

Methylatiott or

tdkc~ts

ft. avi\-irusrs ttllf~lrar.

(‘7:

viruses

unlikel>-

19X0)

Bird.

\-rrtel,l~at

low

the

is ol~scitrr.

RN;\

plYsutmhl\-

and

antl

itisefd

1976:

itt thrw rt~rtt~btxtea. ot’ ttirit, virttsw. like Hal-iviruws.

hosts

is

tlistittc+

of \-itwsw iti

w;Iys

present.

\Vr arc gratrf’ul f’or hefpf’uf disc*ussions with uurnerous c~~llragurs. including C’. Blair. (:. Cleaves. F. Heinz. E:. Strauss and (:. Wenglrr, many of whom furnished J)rrJwints ot’ \vork prior to publication. E. RI. Lenc~hrs f’urnishtd rxvellrnt technical assistance antf thr fY)ttlJ)lltet JXOgrattIS of’ T. Hunkapiller and thr coniputer facility of I,. Hood wertl of great valtw in producing the Figures. This work bvas supported in part b?- grants .41%06J:! and Al I0793 frott~ KTH and grant PCNK-168.56 f’rottt 9SP.

References A4nderson. S. (i., Cafdwll. Ir‘. ,I. I IO-1 14. Bell. ,I. R.. Kinney. Strauss. ,J. H.

81. 1702-1706.

Donnellry. & Eagle. R. M.. (1984).

>I.. Strvenson. $1. (19X?). A’&.

Trent. I’roc.

\2’. .I.. ./. .A/cc.st. 1.

I). IV.. Strauss. E. (:. & ,c’nf. dead. b’ci.. ITSA.

Murphy, F. .4. (1980). In The Togaviruses (Schlesinger. R. W., ed.). pp. 241-316. Academic Press. New York. Sarw. (‘. \v. & ‘I’rrtlt. 1). 11’. (197X). ./. fYiro/. 25. :iRT, 5‘4.i. Ohijeski. ,I. F.. Jlat~vhrtrko. .I. T.. Bishop. 1). H. I,.. (‘attn. 22. 1%.\Y. & Jlttrphy. F. .I. (1974). .J. CA,/. I’id. “I

3.

Ohyattta. -4.. lto. T.. Tanitrtura. E.. Huang. S.-C’.. Hsrtr. .1.-Y. & Furw. Y. (19Y). .IZic~~6io/. /r,/n//r,/ol. 21. .x45- I,57x. I Oka>xttta. H. & Hrty. I’. (I!W). Mol. (‘PI/. /tic,/. 2. ICI 170. Ou. .I.-H.. Trettt. I). \\.. & Strauss. .I. H. (i!)Xr’/~). .J. Nril. Bid 156. 51!J X0.

OII. ,1.-H.. Stratiss. fC. t:. R Strauss. J. H. (l!JH:l). ,J. No/. Bid 168. I I;,. I’eiris. .I S. .\I.. l’orterfield. .I. S. $ Roehrig. .I. T. (19X2). .J. /:vtl.

I-i/v/.

58. “83

“X9.

I’ot~ttdirltl. .I. S. ( i!JXO). In The Togn~~iruses (Schlesinger. f<, \\‘.. rd.). pp. I :1--&B.;-\c*ademiczPress. Se\v York. Rrattttry. I) (‘. (I!W). .-fnrrcl. tlev. Nicro6iol. 36. 17-i3. 1tic.r. (‘. 11. & Strauss. ,J. H. (19Xln). I’wc. .\ilt. .-lrnrl. ,Sri.. I ‘.,\‘..-I 78. 206% 2066. IJic,(a.(‘. 11. & Siwttss. ,I. tf (19816). ,/. Mol. Hiol. 150. 3 I 3- :1-Rt ftic,ts. ( ‘. II.. I,rttc~trrs. E. 11.. Eddy. S. R.. Shirt. S. .J.. Stwrts. ft. I, 8 Strauss. SJ. H. (19X.5). SC;PMP. 229. 7% 733. I
hVitrd

Shapiro. I).. Brandt. 1V. E. & Russell. P. K. (1972). I’imlo~gy, 50. 906-91 1. Shapiro. I).. Kios. K. A. & Russell. f’. K (l!tX). I~irolqy, 56. X8 94. Sttofte. R. K. (19X0). Thr Togrzvirusrs (Schlesinger. R,. \\‘.. ed.). ltp. -Ci Xd. Academic Press. Sew York. Stttith. (:. IV. K- Il’riyht. P. .I. (1985). ./. (:PU I~irol. 66. X!t 571. Strattsh. E. (:. &Y Strauss, ,J. H. (19X5). I’irccs S/rtcc///rr UUC!.-f.rsatvhly (Vasjens. S. ,J.. rd.). pp 10:5 z!:l-l. ,Jones and Rartlrtt Publishers. Kostott. Strauss. E. (i.. Rit+e. (‘. bl. &r Strauss. .I. H I l!JX:l). I’roc. .\lul. .1rrrd. %i.. 17.S.,3. 80. ;i?il-3?7.j. Svttkitr 1.. \... f’garova. T. Y.. (‘hrrtlr~\-skwya. 7’. I’.. f,,yapustitr. \.. S.. l,ashkrvic-it. 1. .A. ,A .\gol. \‘ 1. ( l!)Xl ). 17irolo!gy. 110. “6 3-f. Svitkitt. 1’ V.. l.~aftustiti. \-. S.. Lastike\-ic.11. \‘. .4 k .IgoI. \.. 1. (19X-l). 17irology. 135. Xl6 .%I ‘I’akto. K.. Towatari. T.. Katuttuttra. N.. ‘I‘rllrr. I). (‘. & Titani. K. ( l!W). I’roc~. ,Vctt. .-fcnt/ Sri.. I ‘.kq..4 80. :I666 3670. \\‘d(h. \\’ .I. k S&on. I(. M. (I!JXO). ,I J‘iml 29. I IXIi I l!G. \\~c,trglw. t:. 6 \~‘twgler. c:. (19X1). l’i,n/rq//. 113. ix-c--_ .,.I.). \\7c~tlqlrt~.C;.. \Vettglrr. (:. 8 (gross. H. .I ( I!CXI. I’irolo~q,y. 89. 41.3 4X. \2’rrigh. (i.. tkato. ICI. k \l’etipler. (:. (l!J7!)1 I~i7vlo~~y. 96. 51tib.V!f.

\Yestab\a>.. E. (i.. McKinnon. J. I,. & ~l~~l,cwti. 1,. (i. il!)77). .-lrc/,. J~iro~.53. :105-31-‘. \Vrstaway. K. (i.. Schlesinger. It. \V.. IMr~tnftlr. .I. 11. 8 ‘l’twlt. I). \\.. (1980). IntPwirolog,y. 14. II-f 117. \2’rriau.a~~. E. (i.. Spight,. (:. Rt Endo. I,. (l!JX4). Iyir~cs Ilrs. 1. :l:l:l~-uxt. \Vright. f’. .I. (t!JX%). .J. Grn. i’irol. 59. r’!) 38. \Vripht. I’. .I.. \\‘arr. H. >I. & \Vrstaway. K. (:. (19X1). I~iro/oc///. 109 1lX~ 427.

by S. Hrrttnrt