VIROLOGY165, 234-244
Nucleotide
(1988)
Sequence and Deduced Amino Acid Sequence of the Nonstructural of Dengue Type 2 Virus, Jamaica Genotype: Comparative Analysis of the Full-Length Genome
Proteins
VINCENT DEUBEL,’ RICHARD M. KINNEY, AND DENNIS W. TRENT Division of Vector-Borne Viral Diseases, Center for Infectious Diseases, Centers for Disease Control, Public Health Service, U.S. Department of Health and Human Services, Post Office Box 2087, Fort Collins, Colorado 80522 Received January
I I, 1988; accepted
March
16, 1988
The sequence of the 5’-end of the genome of dengue 2 (Jamaica genotype) virus has been previously reported (V. Deubel, R. M. Kinney, and D. W. Trent, 1986, Virology 155, 365-377). We have now cloned and sequenced the remaining 75% of the genomic RNA that encodes the nonstructural proteins. The complete genome is 10,723 bases in length with a single open reading frame extending from nucleotides 97 to 10,269 encoding 3391 amino acids. The 3’-noncoding extremity presents a stem- and loop-structure and contains a repeated oligonucleotide sequence. Comparisons of the nucleotide sequences of the genomes of dengue 2 viruses of different topotypes reveal 90-95% similarity, with 64-66% similarity evident between dengue viruses of different serotypes. The amino acid sequence of the polyprotein of dengue 2 Jamaica virus shows 97,68,50, and 44% similarity with those of other dengue 2, dengue 1, or dengue 4, West Nile, and yellow fever viruses, respectively. Despite amino acid sequence divergence, the hydrophobic profile of the flavivirus proteins is highly conserved. Proteins NSl, NS3, and NS5 are the most conserved. Conserved amino acid stretches present in all flavivirus proteins may be involved in common essential biological functions. Q 1989Academic Press, Inc..
INTRODUCTION
3’-extremities of the flavivirus genome are not translated and contain stem- and loop-structures that may be involved in regulation of transcription or replication (Rice et a/., 1985; Sumiyoshi et al., 1987; Wengler and Castle, 1986). The four DEN virus serotypes have different pathogenicities for humans, ranging from nondescript febrile illness to the DEN hemorrhagic fever and shock syndrome (DHWDSS) (Halstead, 1980). Dengue fever is the most important vector-borne viral disease of man, occurring in tropical latitudes where large epidemics may involve up to several million people. Temporal and geographic distribution of specific DEN virus genotypes have been determined with serologic (Monath et al., 1986) and molecular markers (Kerschner et a/., 1986; Trent et a/., 1983). Although DHF/DSS was endemic in southeast Asia before the 1960s and in the Americas before the 198Os, hemorrhagic disease has increased because of urbanization permitting higher Aedes species transmission rates, endemic circulation of multiple serotypes of DEN virus, and the transmission of more virulent DEN virus variants (Halstead, 1987; Rosen 1986). In 1981 DHF/DSS occurred for the first time in epidemic form in Cuba (Guzman et al., 1984). A new DEN2 genotype was isolated from severe DEN cases in Jamaica in 1981-l 983 and Puerto Rico in 1986-l 987. The latter genotype differed from the Puerto Rican genotype previously circulating in the
Dengue viruses belong to the Flaviviridae, a family comprising more than 60 agents that are serologically closely related and share a unique structure, replication strategy, and morphogenesis. The mosquitoborne flaviviruses can be grouped on the basis of virus neutralization into three serologic complexes, dengue (DEN), yellow fever (YF), and Japanese encephalitis (JE). The single-stranded 4 X 10” Da RNA genome is positive sense and lacks a 3’-poly(A) tract. It encodes a single polyprotein of about 3400 amino acid residues in the gene order 5’-C.prM.M.E.NSl .NSZa.NSZB. NS3.ns4a.NS4B.NS5-3’(Rice eta/., 1985; Castle eta/., 1985, 1986; Wengler et al., 1985; Coia et al., 1988, Speight et a/., 1988). Individual proteins appear to be processed by post-translational cleavage of the polyprotein at specified sites recognized by cellular or virus-encoded proteases (Rice et a/., 1985; Speight et a/., 1988). The size and functions of the virion structural proteins have been characterized (Rice et a/., 1985; Biedrzycka et al., 1987); however, the functions of the nonstructural proteins have not been assigned. Proteins NS3 and NS5 may have enzymatic roles in RNA replication (Mackow et al., 1987). The 5’- and ’ To whom requests for reprints should be addressed at present address: Molecular Virology, lnstitut Pasteur, 25 rue du Dr. Roux, 75724 Paris Cedex 15, France. 0042-6822/88 Copyright All rights
$3.00
Q 1999 by Academic Press, Inc. of reproduction in any form resewed.
234
SEQUENCE
OF DENGUE 2 GENOMIC
Americas (Trent et a/., 1983). Failure of Aedes aegypti eradication programs has necessitated development of DEN vaccines. Live-attenuated DEN virus vaccines have been derived by repeated passages in cell culture; however, the basis of attenuation is unknown (Eckels et al., 1976). By comparing genome sequences, it may be possible to determine molecular markers of virulence (Hahn et a/., 1987, 1988). The nucleotide sequence of the 5’-end portion of the genome encoding the structural proteins of DEN 2 virus, Jamaica genotype (DEN2JAM), has been published (Deubel et a/., 1986). We now report the nucleotide sequence of the entire genome of DEN2JAM virus and compare it with DEN viruses, DEN 2 Puerto Rico-S1 vaccine (DEN2Sl; Hahn et a/., 1988), DEN 2 New Guinea C (DEN2NGC; Yaegashi et al., 1986), DEN 4 (Zaho et al., 1986; Mackow et al., 1987) and DEN 1 (Mason et a/., 1987). These results have provided new information on flavivirus evolution and replication, and defined structural characteristics of the virus genome useful in DEN virus vaccine development.
MATERIALS AND METHODS DEN virus and cell culture DEN 2 virus strain 1409 isolated in 1983 from a human in Jamaica was used for cloning and sequencing. This virus was passed three times in Aedes albopictus C6/36 cell cultures before being plaque-purified in LLCMK2 cells. Cell infection and virus purification by isopycnic centrifugation in potassium tartrate and glycerol gradients were performed as previously described (Trent et a/., 1983).
Cloning of DEN2JAM RNA Virion RNA was extracted from purified virus with SDS and phenol-chloroform and cDNA synthesis was primed using oligonucleotides complementary to a highly conserved sequence in the flavivirus genome (Fig. 1). cDNA synthesis using heat-denatured DEN2JAM RNA as template was performed as previously described (Deubel et al., 1986). Double-strand cDNA was cloned into the Pstl site of pUC18 plasmid. Selected cDNA clones were ligated into the filamentous coliphage Ml 3mpl8 or mpl9 (Yanish-Perron et a/., 1985) partially digested using the exonuclease activity of bacteriophage T4 polymerase (Dale eT a/., 1985) and sequenced by the dideoxy chain-termination method of Sanger et al. (1977).
235
RNA
Computer analysis Computer analysis was performed on a MV8000 instrument using a sequence analysis program designated SASIP (Institut Pasteur).
RESULTS Determining the DEN2JAM nucleotide sequence Clone p30-VD2 contained genes coding for the DEN2JAM virus structural proteins (Deubel et al., 1986). Clones covering the nonstructural part of the DEN2JAM virus genome were prepared using primer VDl complementary to a consensus sequence of YF, MVE, and WN viruses (Dalgarno et a/., 1986). Primers designated VD3 and VD9 were complementary to the 3’-extremities of DEN 4 (Zhao et al., 1986) and DEN2Sl viruses (Hahn et al., 1988) respectively. Five cDNA clones selected for sequence analysis were subcloned in phage M13, and overlapping clones were prepared for sequencing using T4 polymerase controlled digestion (Dale et al., 1985). The entire 10,723-base nucleotide sequence, including the previously determined sequence encoding the structural proteins (Deubel et a/., 1986) of DEN2JAM genomic RNA, is shown in Fig. 2. The length is similar to that determined for other flaviviruses (Rice et al., 1985; Castle et al., 1985, 1986; Wengler er al., 1985; Mackow et al., 1987; Hahn et al., 1987, 1988; Sumiyoshi et a/., 1987; Coia et a/., 1988). The 3’-extremity of the genome corresponds to the VD9 primer sequence derived from the DEN2Sl virus (Hahn et al., 1988) and has not been determined by direct sequencing of the 3’-terminus of the RNA. Consequently the 30 first nucleotides at the 3’-end of the DEN2JAM RNA remain putative (Fig. 2) although the secondary structure (Wengler and Castle, 1986) in which they are involved seems conserved in the DEN viruses (Fig. 3). Of the 3’-terminal bases 31 to 93 (Fig. 3) DEN2JAM differed from DEN2Sl virus only at position 42, which is G in DEN2Sl virus. The 3’-end sequence differences between DEN 2 and DEN 4 viruses did not affect the secondary structure, as four of the base substitutions maintained base pairing in the stem (arrows in Fig. 3). By comparison with DEN2Sl eight additional base changes were located in the 3’-loop-structure. The open-reading frame in the genome of DEN2JAM RNA begins at nucleotide 97 (Deubel et al., 1986) remains open for 10,173 nucleotides, and encodes 3391 amino acids. The 3’-untranslated sequence is 454 nucleotides in length (Fig. 2). Nucleotide base composition of the entire genome is 33.2% A, 21% U, 25.3% G, and 20.5% C, similar to other DEN2 RNAs (Vezza et
236
DEUBEL, KINNEY, AND TRENT
5’
E
c IPIp]
NSl
: I::
2;
3’
P46-VD3
p93-VDl
pl-VDS
~30.VD2
~44.VD3
VDi?
8
0
2 P
(VDl)
22
VD5
1.5
5
VD8
A
10
AA
Kilobases
.:
: : :
:,
r;:::
::::::
l
.
:
,
:.
I
B 8
,
l
.:
:
:
I --
i 2
,
s .
:
:::
I
.:
::
:
:
c:::::::. zi I
I:.
:
*
I
::;
:+(.‘..(
FIG. 1. Cloning and sequence strategy of dengue 2 virus cDNA. Synthetic oligonucleotides (black triangles) used to prime cDNA synthesis: VDl, 5’-ACG(A)GCT(C)CCA(G)ATTTCTCC-3’; VD3, 5’-AGAACCTT(G)TTGGATCAACAACACCA-3’; VD5. 5’CTCACTTTCAATTCC GATGTTGCGGG-3’; VD8, 5’-GGGTCTCCTCTAACCTCTAG-3’; VD9, 5’AGAACCTGlTGAlTCAACAGCACCAlTCCATTlTCTGGCG-3’. Overlapping clones are designed with a selecting number followed by the name of the primer. Clone p30-VD2 was analyzed previously (Deubel era/., 1986). Single-stranded DNAs in phage M 13 were sequenced after partial exonuclease digestion. Arrows indicate the sense of sequencing in the strand, and vertical bars indicate the beginning of each digested subclone.
a/., 1980; Hahn et al., 1988). Codon usage for the 3391 amino acids was not random (data not shown), showing the same distribution as that reported for the structural protein encoding region (Deubel et a/., 1986). Usage of codons containing the C-G dinucleotide was rare. Nonstructural
DEN2 polyprotein processing
The deduced amino acid sequence of DEN2JAM proteins is shown in Fig. 2. The starting point for most of the nonstructural proteins has been assigned by amino acid alignment with previously sequenced amino termini of DEN 2 (Biedrzycka el a/., 1987) or Kunjin (KUN; Speight et al., 1988) virus proteins (Table
1). Only the amino terminus of ns4a has not been sequenced, and its starting point remains putative. DEN2JAM virus genes are organized as initially established for YF virus (Rice et al., 1985) with the structural proteins C, prM(M), and E (Deubel et al., 1986) followed by the nonstructural proteins NSl, NS2A, NS2B, NS3, ns4a, NS4B, and NS5. The hydrophobicity profile of the DEN2JAM polyprotein is shown in Fig. 4A. Processing of the structural proteins may involve cellular (including Golgi) proteases (Fig. 4B), with cleavage occurring after signal sequences that contain hydrophobic regions preceding prM, E, and NSl (Rice et a/., 198613). The sequence V-X-A, where X is an uncharged
FIG. 2. Complete nucleotide sequence of the dengue 2 virus genome, Jamaica genotype, and the deduced amino acid sequence. The 5’-region of the genome coding for the structural proteins was previously analyzed (Deubel et a/., 1986). Protein nomenclature follows that of Rice eta/. (1985) and modified by Speight era/. (1987). Cleavage sites are indicated by arrows and were assigned by similarity with NH2 termini of sequenced dengue 2 or Kunjin viruses (Table 1). Only ns4a was not precisely located. Black triangles localize potential cleavage sites for ns2A and ns4B previously suggested for yellow fever virus. Asterisks indicate potential N-linked glycosylation sites. Conserved amino acid sequences in the NS5 proteins of flaviviruses are indicated by solid lines. The putative 3’-end of the DEN2JAM virus genome corresponding to the VD9 primer sequence is indicated. Termination codons are boxed. Repetitive nucleotide sequence in the 3’-terminal untranslated region of the dengue genome is underlined. Single letter abbreviations: A, alanine; C, cysteine; D, aspartic acid: E, glutamic acid; F, phenylalanine; G, glycine; H, histidine, I, isoleucine; K, lysine; L, leucine; M. methionine; N, asparagine; P, proline; Q, glutamine; R, arginine; S, serine; T, threonine; V, valine; W, tryptophan; Y, tyrosine.
237
FIG. 2-Continued. 238
SEQUENCE
AG A
b t ACCG=%k
/“)
I I I
I)A A
G Gc U-A G-C G-C A-U - Ucl U-A G-C G . u U-A G-C
T-A
OF DENGUE 2 GENOMIC
;-=‘A,/ .-A
A
T-C-G4-A
AC=-
U-A G-C
43.7Kcal
mediately preceding this site may indicate cleavage by a signalase (Fig. 4). The other nonstructural proteins NS2B, NS5, and the putative ns4a seem to be cleaved after two basic residues (Arg or Lys) and before Gly or Ser residues (Table 1). Although DEN 2 and DEN 4 virus NS3 proteins are cleaved at the site K-X-Q-R/ X-G (Table l), the cluster of basic amino acids at this site may be recognized by the proposed viral protease (Rice eT al., 1985). N-termini of all proteins but ns4a have been assigned by direct amino acid sequencing (Rice et a/., 1986a; Speight et al., 1988) but none of the C-termini
uc-
U U G-C A -U
TABLE 1
C U
-A -lJ
CLEAVAGE SITE AT THE N-TERMINUS OF DEN 2 NONSTRUCTURAL PROTEINSBASEDON ALIGNMENT WITHAMINO ACID SEQUENCEOF OTHER FLAVIVIRUSPROTEINS
C-G A-U C
A C-G A -U G-C G-C U
239
RNA
NSl
DEN 2’ DEN 4 YFB KUNJIN’ WN
GCIMVQCI .FT...b SLG.G. SVN.H. SVN.H.
DSGCVV .M.... .Q. .A1 .T. .A1 .T. .A1
NS2A
DEN 2 DEN 4 YF KUNJIN’ WN
NSLVTFI K.Q... R.N... Q.Q.N. Q.R.N.
GHGQID Q.TSE :EIHCIV YNFlDM YNCIDMI
NS2B
DEN 2 DEN 4 YF KUNJIN” WN
RTSKKR KGASR. . IFGR. DPNR.. DPNR..
SWPLNE . . . . . . .I.V.. G. .AS. G..AS.
NS3
DEN 2’ DEN 4 YFB KUNJIN” WN
EVKKQR Q..T.. VRGC\R. LQYTK. LQYTK.
FIGVLWD S.Fl... S.DVLW G. . . G. . . . .
ns4a
DEN 2 DEN 4 YF KUNJIN WN
FCICIGRK . .S. . . . . E..R . . S.KR . . S.KR
SLTLNL .I.. DI GCIClEV. .QIGFI . QIGLV
NS4B
DEN 2 DEN 4 YF KUNJIN” WN
VCICITMA IGLIA. .s.vcI. .G.VCI. .G.VA.
NEMGFL . . . . LI . . L.M. . . . . WL . . . . WL
NS5
DEN 2’ DEN 4 YFB KUNJIN’ WN
TTNTRR FIQTP.. K.G-. . KPGLK. KPGLK.
GTGNIG . . . TT. .SFi.GK . GCIK . GC\KGR
U
lijl - A ICI - G lu_l - A G Ak??+-$ 3 ’ OH I ITT cuGlG’-’ ,GA/ --0 G C
\ ‘A$
FIG. 3. Stem- and loop-structure proposed for the 3’-terminal sequence of DEN2JAM virus assuming the 30 first nucleotides are identical to DEN2Sl (see text). Conserved nucleotide sequences between flaviviruses are boxed with dashed lines (one nonconserved adenosine is indicated by a dot). Circled adenosine at position 42 is a guanosine in the DEN2PRSl sequence (Hahn et a/., 1988). Arrows indicate changes in the DEN4 sequence (Zhao et a/., 1986).
amino acid, preceding the first amino acid of NSl (Table 1) indicated a possible signal peptidase site (von Heijne, 1986). Amino acid sequencing of the NH,-terminus of protein NS2A of KUN virus (Speight et al., 1988) revealed a possible cleavage site with the same consensus sequence V-X-A in the (-1) to (-3) position. Because no stop transfer or translocation sequences appear upstream from this signal (Sabatini et a/., 1982) Coia et a/. (1988) suggested that this site is conjectural. A viral protease or a signal peptidase located within the lumen of the endoplasmic reticulum may be involved at this cleavage site (Fig. 4). The putative amino-terminal cleavage site of NS4B of DEN 2 virus does not conform exactly to the V-X-A sequence (Thr replaces Val) but is probable (von Heijne, 1986). The long hydrophobic amino acid stretch im-
a Sequences established by N-terminus amino acid sequence of purified viral protein, DEN 2 (Biedrzycka et al., 1987) YF (Rice, et a/., 1986a), KUN (Speight et a/., 1988). b . , amino acid identical to DEN2JAM.
240
DEUBEL, KINNEY, AND TRENT
1 (KDd
2
1
B
N MA F SMT M
VQAD
E 02
0
VTA G NSl
KRS QRA
NSZA
0
@
0”
UK S TMAN
NS
’
AA
WIG 3’
NS3
00 ce,,u,sr
bl
3
protease
8 6
zl”,3’“,‘t’,tl,‘l,
8
“lral : “lr-a,
protease
-
protease
(?I
signa,ese
( 7,
Protein nomenclature follows that of Rice et al. (1985) modified by Speight er a/. (1988). (A) FIG. 4. Dengue 2 genome organization. Hydrophobicity plot of the DENZJAM-encoded proteins. Amino acid indexes were obtained from Kyte and Doolittle (1982) for peptide segments nine amino acids in length. Hydrophobic domains, located below the midline, are shaded. Potential N-linked glycosylation sites are indicated by asterisk. Arrows indicate putative cleavage sites of ns2a and ns4b previously proposed by Rice et al. (1985). (B) Schematic representation of dengue 2 genome with the proposed cleavage points. Putative proteases involved in post-translational processing are proposed.
of these proteins have been determined. Sequence analysis of tryptic peptides of the NS5 protein of WN virus enabled Castle et al. (1986) to directly place the C-terminus of NS5 protein within the last four codons of the open-reading frame. Experiments with KUN virus (Speight et a/., 1988) suggested that a carboxypeptidase present in the cytosol could be involved in a post-translational trimming of the C-terminus of NS5 and some other nonstructural proteins. The molecular weights of the DEN 2 proteins determined by polyacrylamide gel electrophoresis (Smith and Wright, 1985) or calculated from the cDNA nucleotide sequence are listed in Table 2. Calculated molecular weights for putative proteins correlate well with experimentally determined sizes for these proteins in virus-infected cells. A set of low-molecularweight proteins (~14, ~15, ~18) found in DEN 2 virusinfected cells may correspond to NS2 and NS4 proteins (Smith and Wright, 1985). Smith and Wright (1985) characterized the carbohydrate backbone of the three major glycoproteins GP20 (prM), GP46 (NSl), and GP60 (E) in the presence or absence of tunicamycin and showed the contribution of carbohydrate to the molecular weights of these proteins to be about 4000 Da for E and NSl and 2000 Da for prM. These values increase the calculated size of the polypeptides to molecular weights compatible with those measured by gel migration (Table 2). Several potential sites for Asn-linked glycosylation exist in the remaining
translated sequence corresponding to NS2A through NS5, but there is no experimental evidence for glycosylation of these nonstructural proteins (Smith and Wright, 1985).
TABLE 2 MOLECULAR WEIGHTS OF DEN 2 VIRUS-SPECIFIED PROTEINS DETERMINEDBY POLYACRYLAMIDEGEL MIGRATION (SMITH AND WRIGHT, 1985) AND BY GENE SEQUENCING Calculated Proteina C PrM w M E NSl NS2A NS2B NS3 ns4a NS4B NS5
Gel sizing pl35(C) GP20 gpt3 GPGO(E) GP46 ? 7 ~67 7 7 W
MW
&Da) 13 18.86 10.46
8.4 54.36 39.96 23.4 14.2 69.3 16.4 26.7 103
’ New protein nomenclature was established by Rice et al. (1985) and modified by Speight er al. (1988). b The molecular weight corresponds to nonglycosylated form of the protein.
SEQUENCE TABLE
OF DENGUE 2 GENOMIC
NS3 protein, resulting in the loss of amino acids KEF, which are conserved in DEN2JAM and DEN 4 viruses. The genome of DEN2JAM virus has 11 more nucleotides in the 3’-noncoding region than does DEN2Sl virus (Hahn et al., 1988). When compared with DEN2JAM, it appears that DEN 4 virus has 5 and 69 fewer nucleotides in the 5’- and 3’-nontranslated regions, respectively (data not shown), and 15 fewer nucleotides in the coding region (Table 3). Although the 3’-noncoding region seems to diverge between DEN 2 and DEN 4 viruses, the 3’-far extremity is conserved and contains a 20-nucleotide repeat sequence (Fig. 2) that may be important in flavivirus replication (Wengler and Castle, 1986). Nucleotide variation between DEN2JAM and DEN2Sl virus genomes is shown in Table 4. It appears that nucleotide changes are random in the genomes, with divergences ranging from 8.5 to 12.4% in the translated region. Only 13.3% of the base changes result in amino acid substitutions; 10.8% are due to transverse mutation (i.e., purine to pyrimidine or vice versa); and 89.2% of the changes are due to a transition mutation (i.e., purine to purine or pyrimidine to pyrimidine). A portion (4586 nucleotides) of the nonstructural region of DEN2NGC virus has been published (Yaegashi et al., 1986). The DEN2NGC sequence shows 90.3% identity with the nucleotide sequence of DEN2JAM virus in this region. Comparison of the three DEN2 virus genome sequences (Table 5) suggests that DEN2JAM is more closely related to the DEN2NGC
3
COMPARATIVEANALYSISOF DEN VIRUS NUCLEOTIDESEQUENCES
Number of nucleotides mwlng DEN2JAM is taken as reference DEN2Sl DEN 4
9
6352-6360 100-102 2054-2056 3685-3687
3 3 3
Amino acid missing
KEF N
NS3 C
NIL
3
10268-10270
w
NS5
3 3
4096-4098
Y
9469-947 1
M
NSPA NS5
6902-6910
F
Protein Involved
E NS2A NS4B
9 DEN 4 IS taken as reference DENPJAM
Localization in the DEN2JAM nucleotide sequence
M
Evolutionary relationships between DEN serogroup viruses The availability of nucleotide sequence information for viruses belonging to the DEN serogroup (DENl, Mason et al., 1987; DEN4, Zhao et al., 1986; Mackow et a/., 1987) and to other genetic variants of DEN 2 (DEN2NGC, Yaegashi et al., 1986; DEN2S1, Hahn et al., 1988) has permitted a comparative analysis of genome evolution. To optimize similarity it was necessary to introduce gaps in translated amino acid sequences (Table 3). In particular, the sequence of DEN2Sl exhibited a deletion of nine nucleotides in
TABLE
4
COMPARATIVESTUDY OF DEN2JAM AND DEN2Sl
Protein 5’nc B C pr M E NSl NS2A NS2B NS3 ns4a NS4B NS5 3’nca Total
Gene length (nucleotides) 96 342 273 225 1485 1056 654 390 1854’ 450 744 2700 453b 10723
241
RNA
(HAHN ETAL..
1988) VIRUS NUCLEOTIDES
(%I Transition
Transversion
Total
Silent -
Nucleotide divergence
3 25 31 16 118 90 59 36 172 44 53 223 13
2 4 1 3 15 8 9 8 13 4 11 30 2
5 29 32 19 133 98 68 44 185 48 64 253 15
24 22 15 117 84 60 39 168 44 54 216 -
2.1 8.6 11.8 8.5 12.4 9.3 10.5 11.2 10.2 10.7 8.6 9.4 3.3
886
107
993
843
9.2
a nc corresponds to the 5’- and 3’-noncoding regions. b A gap of 9 nucleotides exists in the DEN2Si virus NS3 gene and 11 in the 3’-noncoding
region.
242
DEUBEL, KINNEY, AND TRENT
to different serogroups (Dalgarno et al., 1986; Deubel et al., 1986; Zhao et al., 1986; Trent et al., 1987; Mackow et al., 1987; Hahn et al., 1988; Coia et al., 1988). The amino acid sequence of the DEN2JAM virus corroborates the similarities previously found and confirmed our earlier suggestion that DEN 2 virus was more closely related to WN virus than to YF virus (Deubel et a/., 1986). Nonstructural proteins NS2A, NS2B, ns4a, and NS4B are the least conserved nonstructural proteins and NS3 and NS5 the most conserved. Each of the flavivirus proteins exhibits regions that have a high degree of similarity. Domains in the DEN, YF, and WN proteins share more than 50% amino acid similarity for a window of 20 residues (Fig. 5). Some sequences in NS5 protein share 10 or more amino acids (Fig. 2). The sequence YADDTAGWDTRIT and the 7-mer conserved sequence SGDDCVV in NS5 show some similarity with viral RNA-dependent RNA polymerases (Sumiyoshi et a/., 1987; Mackow et al., 1987). In addition to these sequence similarities, cysteine residues are conserved in the NSl proteins of flaviviruses as was observed in prM(M) and E, and two glycosylation sites in NSl protein are conserved.
TABLE 5 SIMILARITY (IN PERCENTAGE) BETWEEN THE ALIGNED NUCLEOTIDE (NT) AND AMINO ACID (AA) SEQUENCES OF DEN VIRUSES NT
AA
DENPJAM
DEN2Sl
DENPNGC’
DEN4
DENl’
96.8 97.7
90.7 97.6
94.4 93.1
66.2
68.2 68.8
67.9 68.5
NDb ND
66.2 66.1 ND 62.9
DEN2JAM DEN2Sl DEN2NGC’ DEN 4 DEN 1
66.5 ND
64.4 -
a Only partial sequences are compared (see text). b ND, not determined.
prototype than to DEN2Sl virus. Nucleotide divergence between DEN2JAM and DEN2NGC viruses shows only a 5.6% change, with 2.3% difference in the translated amino acids. DEN2 nucleotide and amino acid sequences show about 66 and 68% identity, respectively, with those of DEN 1 (5’-3745 nucleotides) and DEN 4 viruses. DEN 1 and DEN 4 viruses show slightly less similarity with each other (63 to 65%) possibly indicating an increased evolutionary distance between these two viruses when compared to DEN 2 viruses. Comparison between DEN 2 and DEN 4 nonstructural proteins indicates that NS2A and NS2B are the least conserved and NS 1, NS 3, and NS 5 are the most conserved (Table 6). NS2 protein showed the greatest evolution after 240 passages of YF Asibi strain (Hahn et a/., 1987). The hydrophobic profile of both the structural and nonstructural proteins is highly conserved in the DEN serogroup viruses even in domains of low sequence similarity (data not shown). In addition, 739/o of the amino acid substitutions between viruses involved substitutions of one basic residue for another (17 changes) or involved substitutions between Ala, Leu, Ile, Val, and Thr (49 changes).
DISCUSSION The sequence data presented for DEN2JAM virus provide additional information about flavivirus genome structure and organization and help clarify the features of flavivirus evolution. Serologic classification of flaviviruses has been confirmed by amino acid sequence comparison in and between serogroups, indicating that they undoubtedly derive from a common ancestor. The percentage of amino acid similarity between DEN 2, YF, and WN ranged from 44 to 51% while the percentage positional identity was 63 to 68% within the DEN serocomplex. Genetic variants of DEN 2 viruses showed more than 909/o similarity in their amino acid sequence. Similar results were obtained for the JE-KUN-SLE-MVE-WN serocomplex (Trent et a/., 1987; Coia et a/., 1988). Similarities in flavivirus amino acid sequences permit easy identification of cognate
Comparative analysis of DEN2, YF, and WN proteins Flavivirus genome sequence analyses have allowed authors to compare viruses belonging to the same and
TABLE 6 SIMILWUN
ANALYSIS (IN PERCENTAGE) BETWEEN DEN 2 JAMAICA VIRUS AND OTHER FLAWVIRUS PROTEINS
DEN2JAM
YF WN DEN2Sl DEN 1 DEN 4
C
pr
M
E
NSl
NS2A
NS2B
NS3
ns4a
NS4B
NS5
Total
13 33 96.4 68.1 67.2
34 43.9 92.3 76.9 69.2
36 32 96 69.3 64
43 47 97.1 68 62.8
43.6 50.7 96.8 73.8 72.7
21.5 15.9 96.7 33.1
31 29 96.9 55.7
52 59 97.5 76.5
37.3 41 98 63.3
33.4 36 96.7
59.1 65.7 96.4 77.2
43.9 49.9 96.8 68.8 68.2
78.6
243
SEQUENCE OF DENGUE 2 GENOMIC RNA
CprM 5’8’ ’ ’ ’ 0
E
NSl 2
NS NS 2A 28 NS3 4
ns NS 4a 40 6
NS5 8
I4 10
3
FIG. 5. Schematic representation of the flavivirus polyprotein showing highly conserved regions (shaded areas). Amino acid sequences of yellow fever, West Nile, and dengue viruses were compared pairwise using a window of 20 amino acids. Domains where the three sequences share more than 50% similarity are shaded. Asterisks indicate conserved potential glycosylation sites.
proteins. A comparison of the individual proteins revealed that NS2 is the least conserved, while NS3 and NS5 were highly conserved. The hydrophobic profile of the flavivirus polyprotein is highly conserved, even in the NS2 region, reflecting functional importance of conserved protein structure in the life cycle of the virus. In view of recent results suggesting an alternate cleavage site for KUN virus NS2A and NS4B proteins (Speight et a/., 1988), flavivirus NSl would not contain a transmembrane hydrophobic stretch at its C-terminus. This change in the proposed genomic structure of NSl would modify the molecular weight of NSl to be in agreement with that determined by gel electrophoresis. The impact of this observation on the proposed role of NSl in the biology of virus replication and immunology of host resistance certainly are modified by this proposal (Schlesinger et al., 1987). More work must be done to clarify the molecular biology of NSl and other nonstructural proteins. Our sequencing of the full DEN2JAM genome has allowed us to make a sequence comparison with a DEN 2 strain derived from the Puerto Rican isolate PR159. Since the sequence was derived from unique cDNA clones, cloning artifacts or cloning of variants in the virus population may result in a sequence that deviates from the average sequence of the parent RNA population by one or several nucleotides. The DEN2PR159 strain was passed 19 times in cell culture and was plaque purified before the Sl vaccine variant was selected (Eckles et al., 1976, 1980). A 0.63% divergence in nucleotide sequence occurred during 243 passages of the Asibi strain of YF virus (Hahn er a/., 1987), and therefore we could expect about five nucleotide changes during the in vitro passages of DEN2PR159 virus. Notably three amino acids were missing near the putative C-terminus of the DEN2Sl NS3 protein as compared with the same protein of DEN2JAM and DEN 4 viruses. It would be interesting to determine if this modification in sequence was present in the parental sequence DEN2PR159 strain
or if the mutation occurred during in vitro passage. Nucleotide changes in the 3’-noncoding region of DEN 2 viruses corroborate changes observed with Asibi/ 17D viruses (Hahn et a/., 1987) and may reflect a lack of selective pressure for sequence conservation in this region of the genome. Variations in the DEN genome were previously analyzed by Tl mapping (Trent et al., 1983). DEN2JAM and DEN2PR viruses were among six established genetic variants that shared less than 25% similarity in Tl oligonucleotides. The expected 10% nucleotide divergence (Aaronson et a/., 1982) was indeed observed by sequence comparison. Kerschner era/. (1986) were able to select group- and strain-specific DEN oligonucleotides, the sequence of which allowed them to prepare synthetic probes for comparative hybridization studies. The sequence information available permits selection of conserved and variable nucleotide regions in DEN 2 viral RNA for synthesis of complementary oligonucleotides that could be useful in strain typing and evolution studies. Correlation of amino acid sequence analyses of the E proteins, and perhaps the NSl proteins, of DEN virus strains with epitope mapping using monoclonal antibodies and monoclonal antibody-derived DEN virus mutants should enable investigators to determine those epitopes that are involved in group- or typespecificity. This information is critical for the construction of molecular immunogens. ACKNOWLEDGMENTS We thank Drs. Vance Vorndam and Gwong Jen Chang for providing the synthesized oligonucleotides, Joyce Grant for technical assistance, Dr. James Strauss for providing the DEN2Sl virus sequence before publication, Gerard Masson for assistance in computer analyses, and Judy Parizek for expert typing. This research was supported in part by the Direction des Recherches Etudes et Techniques 85 34 820 00 470 75 01 (Paris) and by Fellowship 1F05 TWO 3596 from the International Research and Awards Branch, National Institutes of Health, Bethesda, Maryland.
REFERENCES AARONSON,R. P., YOUNG,J. F., and PALESE.P. (1982). Oligonucleotide mapping: Evaluation of its sensitivity by computer simulation. Nucleic Acids Res. 10, 237-246.
BIEDFIZYCKA, A., CAUCHI,M. R., BARTHOLOMEUSZ, A., GORMAN,J. J., and WRIGHT,P. J. (1987). Characterization of protease cleavage sites involved in the formation of envelope glycoprotein and three non-structural proteins of dengue virus type 2, New Guinea C strain. 1. Gen. Viral. 66, 1317-l 326. CASTLE,E., LEIDNER,U., NOVAK,T., WENGLER,G., and WENGLER,G. (1986). Primary structure of the West Nile flavivirus genome coding for all nonstructural proteins. Virology 149, 1O-26. CASTLE,E., NOVAK,T., LEIDNER,U., WENGLER,G., and WENGLER,G. (1985). Sequence analysis of the viral core protein and the mem-
244
DEUBEL, KINNEY, AND TRENT
brane-associated proteins Vl and NV2 of the flavivirus West Nile virus and of the genome sequence for these proteins. Virology 145,227-236. COIA, G.. PARKER,M. D., SPEIGHT, G., BYRNE, M. E., and WESTAWAY, E. G. (1988). Nucleotide and complete amino acid sequence of Kunjin virus: Definitive gene order and characteristics of the virusspecified proteins. 1. Gen. Viral. 69, l-21. DALE, R. M. K., MCCLURE, B. A., and HOUCHINS, J. P. (1985). A rapid single-stranded cloning strategy for producing a sequential series of overlapping clones for use in DNA sequencing: Application to sequencing the corn mitochondrial 18 rDNA. Plasmid 13, 3 l-40. DALGARNO, L., TRENT, D. W.. STRAUSS,1. H., and RICE, C. M. (1986). Partial nucleotide sequence of the Murray Valley encephalitis virus genome: Comparison of the encoded polypeptides with yellow fever virus structural and nonstructural proteins. 1. Mol. Biol. 187,309-323. DEUBEL, V., KINNEY, R. M., and TRENT, D. W. (1986). Nucleotide sequence and deduced amino acid sequence of the structural proteins of dengue type 2 virus, Jamaica genotype. Virology 155, 365-377. ECKELS, K. H., BRANOT, W. E., HARRISON,V. R., MCCOWN, J. M., and RUSSELL, P. K. (1976). Isolation of a temperature-sensitive dengue-2 virus under conditions suitable for vaccine development. Infect, Immun. 14, 1221-1227. ECKELS, K. H., HARRISON, V. R., SUMMERS, P. L., and RUSSEL, P. K. (1980). Dengue 2 vaccine: Preparation from a small-plaque virus clone. Infect. lmmun. 27, 175-l 80. GUZMAN, M., KOURI, G., MORIER, L., SOLER, M., and FERNANOEZ,A. (1984). A study of fatal hemorrhagic dengue cases in Cuba, 1981. Bull. PAHO 18, 213-220. HAHN, C. S., DALRYMPLE,J. M., STRAUSS,J. H., and RICE, C. M. (1987). Comparison of the Asibi strain of yellow fever virus with the 17D vaccine strain derived from it. Proc. Nat/. Acad. Sci. USA 84, 201 g-2023. HAHN, Y. S., GALLER, R., HUNKAPILLER,T., DALRYMPLE,J. M., STRAUSS. J. H., and STRAUSS,E. G. (1988). Nucleotide sequence of dengue 2 RNA and comparison of the encoded proteins with those of other flaviviruses. Virology 162, 167-l 80. HALSTEAD, S. B. (1980). Immunological parameters of Togavirus disease syndromes. In “The Togaviruses” (R. W. Schlesinger, Ed.), Chap. 5, pp. 107-l 13. Academic Press, New York. HALSTEAD, S. B. (1987). Selective primary health care: Strategies for control of disease in the developing world. XI. Dengue. Rev. Infect. Dis. 6, 251-264. KERSCHNER,J. H., VORNOAM, A. V., MONATH, T. P., and TRENT. D. W. (1986). Genetic and epidemiologic studies of dengue type 2 viruses by hybridization using synthetic deoxyoligonucleotides as probes. /. Gen. V/ro/. 67, 2645-2661. KYTE, J., and DOOLITTLE, R. F. (1982). A simple method for displaying the hydropathic character of a protein. 1. Mol. Biol. 157, 105-l 32. MACKOW, E., MAKINO, Y., ZHAO, B., ZHANG, Y-M., MARKOFF, L., BUCKLER-WHITE,A., GUILER, M., CHANOCK, R., and LAI. C-J. (1987). The nucleotide sequence of dengue 4 virus: Analysis of genes coding for nonstructural proteins. Virology 159, 2 17-228. MASON, P. W., MCAOA, P. C., MASON, T. L., and FOURNIER, M. J. (1987). Sequence of the dengue-1 virus genome in the region encoding the three structural proteins and the major nonstructural protein NSl. virology 161, 262-267. MONATH, T. P., WANDS, J. R.. HILL, L. J., BROWN, N. V., MARCINIAK, R. A., WONG, M. A., GENTRY, M. K., BURKE, D. S., GRANT, 1. A., and TRENT, D. W. (1986). Geographic classification of dengue-2 virus strains by antigen signature analysis. Virology 154, 313-324. RICE, C. M.. AEBERSOLO, R., TEPLOW, D. B., PATA, J., BELL, J. R.,.
VORNOAM. A. V., TRENT. D. W., BRANORISS,M. W., SCHLESINGER, J. J., and STRAUSS, J. H. (1986a). Partial N-terminal amino acid sequences of three nonstructural proteins of two flaviviruses. Virology 151, 1-9. RICE, C. M., LENCHES, E. M., EDDY. S. R., SHIN, S. J., SHEETS, R. L., and STRAUSS, J. H. (1985). Nucleotide sequence of yellow fever virus: Implications for flavivirus gene expression and evolution. Science 229,726-733. RICE, C. M., STRAUSS,E. G., and STRAUSS,J. H. (1986b). Structure of the flavivirus genome. In “Togaviridae and Flaviviridae” (M. Schlesinger and S. Schlesinger, Eds.), Chap. 10. pp. 279-326. Plenum, New York. ROSEN, L. (1986). The pathogenesis of dengue hemorrhagic fever. S. Amer. J. Med. (SuppI.) 11, 40-42. SABATINI, D. D., KREIBICH,G., MORIMOTO, T., and AOESNIK, M. (1982). Mechanisms for the incorporation of proteins in membranes and organelles. J. Ceil Biol. 42, l-22. SANGER, F., NICKLEN, S., and COULSON, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Nat/. Acad. Sci. USA 74, 5463-5467. SCHLESINGER,1. J., BRANORISS,M. W., and WALSH, E. E. (1987). Protection of mice against dengue 2 virus encephalitis by immunization with the dengue 2 virus non-structural glycoprotein NSl. J. Gen. Viral. 68, 853-857. SMITH, G. W., and WRIGHT, P. J. (1985). Synthesis of proteins and glycoprotein in dengue type 2 virus-infected Vero and Aedes a/bopicfus cells. 1. Gen. Viral. 66, 559-571. SPEIGHT, G.. COIA, G., PARKER, M. D., and WESTAWAY, E. G. (1988). Gene mapping and positive identification of the nonstructural proteins NS2A, NS2B, NS3, NS4B and NS5 of the flavivirus Kunjin and their cleavage sites. J. Gen. Virol. 69, 23-34. SUMIYOSHI,H., MORI, C., FUKE, I., MORITA, K., KUHARA,S., KONOOU,J., KIKUCHI,Y., NAGAMATU, H., and IGARASHI,A. (1987). Complete nucleotide sequence of Japanese encephalitis virus genome RNA. Virology 161, 497-510. TRENT, D. W., GRANT, 1. A., ROSEN, L., and MONATH. T. P. (1983). Genetic variation among dengue 2 viruses of different geographic origin. Virology 128, 271-284. TRENT, D. W., KINNEY, R. M., JOHNSON, B. J., VORNOAM, A. V., GRANT, 1. A., DEUBEL, V., RICE, C. M., and HAHN, C. S. (1987). Partial nucleotide sequence of St. Louis encephalitis virus RNA: Structural proteins, NSl , ns2a, and ns2b. Virology 256, 293-304. VEZZA, A., ROSEN, L., REPIK, P.. DALRYMPLE, 1. M.. and BISHOP, D. H. L. (1980). Characterization of the viral RNA species of prototype dengue viruses. Amer. J. Trop. Med. Hyg. 29, 643-652. VON HEIJNE,G. (1986). A new method for predicting signal sequence cleavage sites. Nucleic Acids Res. 14, 4583-4690. WENGLER,G., and CASTLE, E. (1986). Analysis of structural properties which possibly are characteristic for the 3’terminal sequence of the genome RNA of flaviviruses. /. Gen. Viral. 67, 1 183-l 188. WENGLER, G., CASTLE, E., LEIONER, U., NOWAK, T.. and WENGLER, G. (1985). Sequence analysis of the membrane protein V3 of the flavivirus West Nile virus and of its gene. Virology 147, 264-274. YAEGASHI. T., VAKHARIA, V. N., PAGE, K., SASAGURI,Y., FEIGHNY, R.. and PAOMANABHAN, R. (1986). Partial sequence analysis of cloned dengue virus type 2 genome. Gene 46,257-267. YANISH-PERRON,C., VIEIRA. J., and MESSING, J. (1985). Improved Ml3 phage cloning vectors and host strains: Nucleotide sequence of the M13mp18 and pUCl9 vectors. Gene 33, 103-119. ZHAO, B., MACKOW, E., BUCKLER-WHITE,A., MARKOFF, L., CHANOCK, R. M., LAI, C. J.. and MAKINO. Y. (1986). Cloning full-length dengue 4 viral DNA sequences: Analysis of genes coding for structural proteins. Virology 155, 77-88.