VIROLOGY
175, 167-l 75 (1990)
Nucleotide
Sequence Analysis Comparison
of the S Genomic Segment of Prospect with the Prototype Hantavirus
Hill Virus:
MARK A. PARRINGTON AND C. YONG KANG’ Department
of Microbiology
and immunology,
University of Ottawa, Faculty of Medicine,
Ottawa, Ontario, Canada K 1H 8M5
Received July 25, 1989; accepted October 24, 1989 The genomic S RNA segment of Prospect Hill virus (PHV), a member of the Hanfavirus genus, was molecularly cloned and the nucleotide sequence of the cDNA determined. The PHV S RNA segment is 1675 nucleotides long. A long open reading frame was identified in the viral complementary-sense RNA that could encode a 433 amino acid (49K) nucleocapsid (N) protein. Comparison with the sequence of the related Hanfavirus (Hantaan 76-l 18) S RNA segment indicated that there was 57% nucleotide sequence homology between the two S RNA segments. A higher degree of conservation in amino acid sequence homology (62%) was observed in the N proteins of these viruses. At the Nterminus 147 of 225 amino acids are homologous, while approximately 82% of the 124 amino acids at the C-terminus are homologous between the two N proteins. The longest stretch of homologous amino acid sequence is found in this region, and is 17 amino acids in length. Also, many of the differences in amino acid sequence between the two N proteins resulted from conservative substitutions. Hydropathy plots of the two N proteins also reveal many similarities including a conserved potential antigenic site. Unlike Hantaan virus, a second smaller overlapping open reading frame was observed in the viral complementary-sense RNA of PHV and could potentially encode a 90 amino acid (10.5K) protein. Our data indicate that the N proteins of PHV and Hantaan virus are closely related despite divergence in the o 1woAcademic PWS, IN. nucleotide sequence of their S RNA segments.
INTRODUCTION
26K N protein and a 29K NS, protein using an ambisense coding strategy (Bishop, 1985). Unlike other Bunyaviridae, the Hantaan virus S RNA segment appears to encode only a 48K N protein using a single long open reading frame (Schmaljohn et al., 1986). In this paper, we present the nucleotide sequence of the PHV S RNA segment and the deduced amino acid sequence of its N protein. These sequences are compared to the analogous sequences of the prototype Hantaan virus strain 76-l 18. Our data indicate that there is a considerable degree of amino acid sequence homology between the N proteins of the two viruses.
Prospect Hill virus (PHV) is a member of the HantaviTUSgenus (Schmaljohn et al., 1985) in the Bunyaviridae family and is antigenically related to Hantaan virus, the etiologic agent of hemorrhagic fever with renal syndrome (Lee et al., 1978). PHV was isolated from the lung tissue of meadow voles (Microtuspennsylvanicus) captured in Frederick, Maryland (Lee et al., 1982, 1985). In contrast to other Hantaviruses there is no evidence linking PHV to disease in man (Yanagihara et al., 1984, 1987). Hantaviruses like most Bunyaviridae possess a tripartite, single-stranded RNA genome with a negative polarity (Schmaljohn and Dalrymple, 1983). The three RNA segments large (L), medium (M), and small (S) have molecular weights of approximately 2.2, 1.3, and 0.5 X 106, respectively (Yoo and Kang, 1987). All members of the Bunyaviridae family examined to date, except Hantaan virus, encode a nucleocapsid (N) protein and a nonstructural (NS,) protein with the S RNA segment. However, the coding strategies of the S segments differ between genera. Membersof the Bunyavirus genus use overlapping open reading frames to encode a 19K-26K N protein and a 1OK NS, protein. Viruses in the Phlebovirus genus, however, encode a
MATERIALS AND METHODS Viruses, cells, and media Prospect Hill virus (strain Prospect Hill-l), provided by Dr. J. M. Dalrymple (USAMRIID, Frederick, MD), was propagated in Vero E6 cells (ATCC 1008, CRL 1586). Cells were grown in Dulbecco’s modified Eagle’s medium, supplemented with 10% heat-inactivated fetal bovine serum and 2 mM L-glutamine. Virus was propagated at 35” in loo-mm tissue culture dishes and harvested 8, 12, and 15 days after infection.
Virus isolation and RNA purification Virus particles in clarified tissue culture fluid were pelleted by centrifugation for 2 hr at 81,000 g at 4” in a
’ To whom requests for reprints should be addressed.
167
0042-6822/90
$3.00
Copynght Q 1990 by Academic Press. Inc. All rights of reproducilon in any form reserved
168
PARRINGTON
Beckman SW 28 rotor (Cedar Grove, NJ). Virus pellets were resuspended and viruses were disrupted as previously described (Yoo and Kang, 1987). RNA was extracted with phenol-chloroform-isoamyl alcohol (25: 24: l), reextracted with chloroform-isoamyl alcohol (24: l), and stored at -70” until purification. The purification of PHV RNA was based on the procedure of Chirgwin et a/. (1979). In a typical experiment, a 2.7-ml solution of PHV RNA was mixed with 2.918 g of CsCl and layered on a l-ml cushion of 5.7 NI CsCl buffered with 25 mll/l sodium acetate (pH 5.5). RNA was pelleted by centrifugation for 18 hr in a Beckman SW 50.1 rotor at 121,000 g at 25”. Pelleted RNA was resuspended in Hz0 by brief heating at 68” and then ethanol precipitated. Molecular cloning and sequence analysis of PHV S RNA Virion RNA was used as a template for the synthesis of cDNA as described (Gubler and Hoffman, 1983). First-strand cDNA synthesis was done in a reverse transcriptase reaction mixture containing 1 mlLl dlTP, 1 mM dCTP, 1 mM dGTP, 200 PM dATP, 100 mM Tris (pH 8.4) 130 mll/l KCI, 10 mN1 MgCl*, 1 rnM DlT, 20 units of RNasin (Promega Biotec, Madison, WI), 50 &i of [a-32P]dATP (800 Ci/mmol) (Amersham, Arlington Heights, IL). The reaction mixture of 40 ~1 also contained 40 units of AMV reverse transcriptase (Pharmacia Biotechnology, Uppsala, Sweden) using approximately 10 ng of viral RNA and 200 ng of random primers (Pharmacia Biotechnology) and was incubated for 1 hr at 42”. The cDNA was converted to double-stranded DNA by adding l/l 0th volume of 1OX nick-translation buffer(0.5 MTris, pH 7.2, 0.1 MMgS04, 0.01 /1/1DlT, 500 pg/ml BSA fraction V), 2 units of RNase H (BRL, Gaithersburg, MD), and 30 units of fscherichia co/i DNA polymerase I (Pharmacia Biotechnology). The reaction was incubated for 2 hr at 15” and then terminated by the addition of EDTA and SDS to final concentrations of 50 m&I and l%, respectively. The doublestranded cDNA was size fractionated on an agarose gel and large cDNA molecules were ligated to phosphorylated Pstl linkers (New England Biolabs, Beverly, MA) digested with the restriction endonuclease Pstl, ligated into the Pstl site of pUCl9, and used to transform JM 101 cells. Specificity of the inserts in the resulting clones was determined by Northern blot analysis using PHV RNA. The largest S RNA specific insert P275 (1 176 bp) was subcloned into the bacteriophage Ml 3mpl9 (Messing, 1983) and sequenced in both directions by the dideoxy chain termination method of Sanger et a/. (1977) using several synthetic primers
AND KANG
synthesized on an Applied Biosystems Model 380B DNA synthesizer. Nucleotide sequences were determined both by manual sequencing and by automated sequencing using the DuPont Genesis 2000 automated DNA sequencer. To obtain the 3’terminal clone, two oligonucleotides (MPll and MP12) were synthesized (Fig. 1). Primer MPl 1 has 21 nucleotides complementary to the 3’terminus of the PHV S RNA (Schmaljohn et a/., 1985) and was used for first-strand cDNA synthesis as described above. Primer MP12 has 20 nucleotides complementary to PHV S cDNA (nucleotide positions 497-516, measuring from the 3’ end of the S RNA) as determined by nucleotide sequence analysis of clone P275. Amplification of the cDNA primed with MPl 1 was performed with the Geneamp kit and method provided by PerkinElmer Cetus Corp. (Norwalk, CT) using 100 ng each of primers MPl 1 and MP12. Both primers had Pstl sites at their 5’ ends so the resultant DNA fragment PS3 could be inserted into the Pstl site of pUCl9. The 5’ terminus clone of the PHV S RNA was prepared essentially the same as the 3’terminus. Two oligonucleotides, MP13 and MP14, were synthesized (Fig. 1). Primer MP13 had 20 nucleotides complementary to virion S RNA (nucleotide positions 1451-l 470, from the 3’end of the S RNA) as determined by nucleotide sequence analysis of clone P275. Primer MP14 had 8 nucleotides identical to the predicted 5’terminus of virion S RNA. Prediction of the 5’terminus was based on the usually observed inverse complementarity between the 3’ and 5’ termini of Hantaan virus RNA segments (Schmaljohn et a/,, 1986, 1987). Primer MP13 was used to make first-strand cDNA as described above, and this cDNA was amplified as previously described with 50 ng each of primers MP13 and MPl4. The Pstl sites at the 5’ ends of these primers allowed insertion of PS5 DNA into the Pstl site of pUC19. Clones PS3 and PS5 were sequenced in both directions using the methods described above. Computer analysis of nucleotide and amino acid sequences Analysis of secondary structure of the RNA, searches for potential open reading frames, translation of a DNA sequence, hydropathy plots, and searches for potential antigenic sites were analyzed using the PC/Gene program (Version 5.11, 1987, Department of Medical Biochemistry, University of Geneva, Switzerland, distributed by Intelligenetics, Inc., Mountainview, CA). Analysis of secondary structure analysis of the RNA was based on Zucker’s method (Zucker and Stigler, 1981) with modifications (Jacobsen et al., 1984).
S RNA SEQUENCE
PHV ’
3’1
0.6 0.2
OF PROSPECT
169
HILL VIRUS
S GENOME 0.8
1.0
1.4
1.2
1.6 1.675 (Kb) 1
0.4
P275(1176bp)
5’ 3’
3’ 5’
MP13
MPll
-
PS3(516bp)
PS5(224bp)
3’ 5’
i:
;:
MP12
MPI
pst KS’-CAGCTGCAG
MPl2:
TAGTAGTAGACTTCGTAAAGA-3’
MP14 Pst 1 MP13;5’-CAGCTGCAGCTAATTCACAATTAAACATT-3’
3’-TATTCCCCTGTTCCTAGTCCGACGTCGAC-5’ PQtl
1
;:
YP14:3’-ATGATGATGACGTCGAC-5’ Pst 1
1
2
2
1353 1076872c534bp 310-
242 bp--
A
B
FIG. 1. Cloning strategy of the PHV S RNA genomic segment. The PHV S RNA segment is represented by the solid line at the top of the figure. The size of clones PS3, P275, and PS5 and the portion of the PHV S RNA they represent are shown below. The clones are represented as double-stranded to illustrate where the primers anneal. The sequence of primers MPl 1, MP12, MP13, and MP14 is shown below the clones. Dark solid lines above the primers indicate sequence homologous with viral RNA. Dark solid lines below the primer indicate viral complementary sequence. Light solid lanes indicate the Psrl sites on all the primers that allowed insertion of the amplified products into the Pstl site of pUCl9. (A) Lane 1, Haelll-digested 4X174 phage DNA fragments as molecular weight markers; lane 2, one-fifth of the DNA from amplification of the 3’ end of the PHV S RNA segment using primers MPl 1 and MP12. The 534-bp fragment is PS3 with the additional 9 bp from each of the primers; (B) Lane 1, one-tenth of the DNA from amplification of the 5’ end of the PHV S RNA segment using primers MP13 and MP14. The 242-bp fragment is PS5 with the additional 9 bp from each of the primers; lane 2, Haelll-digested @Xl 74 phage DNA fragments as molecular weight markers.
Hydropathy plots were generated according to the method of Kyte and Doolittle (1982) while searches for potential antigenic sites were done using the method of Hopp and Woods (198 1).
RESULTS Molecular cloning of the PHV S RNA segment and nucleotide sequence analysis of its cDNA A bank of cDNA clones representing PHV RNA sequences was made using random primers as described under Materials and Methods. S RNA specific clones were sized, and the largest clone (P275) was isolated from this bank. Since clone P275 had only 1 176 bases of the S RNA, this clone could not represent the entire PHV S RNA segment. The nucleotide sequence of this clone was determined using the di-
deoxy chain termination method (Sanger et a/., 1977) and compared with the nucleotide sequence of the Hantaan virus (strain 76-l 18) S RNA segment (Schmaljohn et al., 1986). This comparison revealed clone P275 lacked approximately 500 bases from the 3’ end of the PHV S RNA and approximately 50 bases from the 5’ end. We decided to clone the 3’ and 5’ ends of the PHV S RNA by gene amplification using Taq DNA polymerase. Amplification of the 3’ end of the PHV S RNA was done using primers MPl 1 and MP12 (Fig. 1) as described under Materials and Methods. Amplification generated a 534-bp fragment (Fig. 1A, lane 2) that was cloned into the Pstl site of pUCl9 and designated PS3. Nucleotide sequence analysis revealed that PS3 contained the reported 3’ terminal sequence for PHV S RNA (3’-AUCAUCAUCUGAAGCAUUUCUCGAU) (Schmaljohn et a/., 1985). Therefore, clone PS3 most
170
PARRINGTON 76-E
AND KANG
- TAGTAGTAGACTTCGTAAAGAGCTACTACTACAAGTGCTGGG~AGCC~CT~GGG~ ***********cC*C****+********GA*r*rC-----****GCAACTA*GGA**** 61 55 ATACAGGAAGAGATCAeTCGCCATGAGCAGCAGCTTGTCA~GCCCGGCAG~GCTCAAG T*****AGG**A****A*GC*******GGT**GGT**AT*h*~G**A~*~A~*~****~G*G*G*
60 54 120 114
121 115 GAAGCTGAACGGACGGTGGAGGTGGACCCAGATGATGACGTT~C~GTACAC~C~GC **T**A***AAACA*TAT**~~****+t***~~~**GT~G~****G~~A***T*~CTGA*
180 174
181 175 AGGC'GGTCAGCAGTGTCAACATTGGAGGACAAATTGGCAGAGGCAGCTTGCA C*AGA*GGC*TT*CAGT*T*TA*CC***CA***A*T*AT+****A*~A**~**A*~G***
240 234
235 GATGTCATCTCACGTCAGAAGATGGATGAGAAACCTGTGGATCC~~GGTA~~GCTT 241 ***AGG**TG**AC*GG***A*ACCT**GGAACAA+****~*~h~*GG*G******
300 294
301 295 GACGACCATCTTAAGGAGAGGTCAAGCCTCCTCC~TATGG~TGTC~GATGTG~TTCC *GA******r*G**A*********tTG***TG***AGT*****T*****G~*G***T*A**CCAT
360
361 355 ATTGATATAGAAGAACCTAGTGGCCAGACAGCTGATTGCCCATC T”G*****T**T*******CA~*A******A**C*****~*GC~~CAT*GT***TC*T
420
354
414
415 ATAGAATTTGCACTACCTATCATCTTGAAAGCCTTGCCTTGCATATGTTGTC~CTAGAGGGAGG 421 *C*TCC****TCG*C**Gt*A~*TC~***TC***TC**T*~~*****A~***A**G****** 480 474
474 CAAACTGTAAAAGAGAATAAGGGGACAAGGATCAGGATCAGGTTC~GATGATAGTTCCTATGAA 481 ******ACC**G**T*****A~**CC**CC**T+*G****~***C**G*TC**G 540 534
535 GATGTGAATGGCATTAGACGCCCAAAGCACCTTTTATGTGT~TATGCC~CAGCCCAGTCA 541 *r***T**C**T**CC*GAAA*****A**A*~T~~***C~****CT******AT**A***r**
600 594
ACAATGAAAGCTGAGGAATTAACACCACCAGGGAGA~CAGGAC~TTGTTTGTGGACTATTT *GC*****G**A**A**GA*T***T**T****AT*+A***GCA**C****~G**c*AC
660
661 655 CCTGCACAGATCATGGCAAGAAATATCATCATCAGTCCTGT~TGGGTGTGATCGGA~TGCA *t******a+*T*A****C*GC~G**+*****A**T*~*h*~~*A**T**T***CT*
720
715 721
780 774
601 595
TTTTTTGTAAPiGGATTGGGCTGACAAAGTAAAGGCATTTCTTGhCCAGAAhTGTCCATTC GCA**A*C******C***AG***TCGTA*CG*ACA**GGTGC*T
781 775 CTAAAGG~GAGCCACGTCCTGGACAGCCTGCCGGTGARGTCTCAGTAGTATT **TCCA*A*ACAG**GCAGT*A*CCTC*T**GT*** ------------*CTGCA*C*AC
654 714
840
822
823 AGGGCCTACCTCATGAACCGGCAAGCAGTCCTAGTCCTAGATG~CACATCTG~CAGACATAGAT 841 ****A****T+ACG*e*G*******~TG*CA*T'G**GTC~A*G 900 882
883 901
GCACTAGTTGAACTTGCTGCCTCAGGGGATCCAACACTGCTCCA **TA**CGCC*G*A***A*AG***CT*GCTGT*GCA**ATT~~AGATA****GTCA*~*
960
942
961 CATGCAGCTTGGGTCTTTGCATGTGCTCCTGACCGATGTCCACC~CATGCAT~ATATT 943 TCAT**ATA**t**Tt****TG~A*~A**A~*~~~T~~***+********TT*G*T***A
1020 1002
1021 1003
GCAGGGATGGCAGAACTTGGCGCATTTTTTGCAATCCTACCCATC *****T**T**T**G*****G*******T+C*****G+***~C*~*********A***
1080 1062
1081 1063
ATGGCATCTAAAACCGTAGGAACAGCTGAAGCTGAAG~GCTTAAAAAG~GTCTGCATTTTAT *x*********G**A**T**+*I*T**T****G**G***~*ACGG*****h**AT********
1140 1122
1141
1200 1123 CAGTCGTACCTACGAAGAACACAATCTRTGGGGATGGGGATCCAG~TAGACCAGAGGATTATCCT~ t****C*****CA****GI*******A*r**+******A**A****G******A~****TG*G 1182 1201 ATGTACATGATTGAGTGGGGAAATGAGGTCGTTAACCACTATATG 1183 CIC*T****G***CC***t****G**G~**A+**~*~**T~A**G*~*~**~**
1260 1242
1261 1243
1320 1302
GATCCCGAGCTAAGGCAGTTAGCTCAAGCTCTCATTGACCGAGATATCT *****T+********ACAC*G*CA**GAGCT*GAG~*G~****TGTC**A~~G*~G**A**C**C
1321 AACCAAGAGCCACTAAAGATA~T~CCTGCTTGCATAGCTTCTAC~TATAGCCTAT 1303 ***********TT*G**AC*C**A*T*ATGAA*GTAT**ATCCT*TTA*G*GATTATC**
1380 1362
1381 ATATCCAGCTGCTTATTATGATTTATAATCACAATAATTATTGTCATGTTA 1363 ***CTACTGAATCAT*ATCA**CATATT*GCACTATTA*TGTATA~C
1440 1422
1441 1423 1501 1483 1561 1543 1621 1603 1647 1663
ATCTCACTTACTAATTCACAATTAAACATTGCACTTAACTTTG *GGG**TGGGAAC***T*TGGG*GGGA**CATTAC*C*GGGGTGGG*CA*~A*~C~G*T
1500 1482 ATGGAACAATACTGTAAAACTGAGCTATCCCTGTGATTCCCTGTGATTCATCTCTGTCTTCATCATCTCT 1560 G***GTGGG*TTA*CTCC*GGCTA*CT*AAG*AGCC**TT*T*G*A~A*ATGGATGTAGA 1542 AAGGCCACTACCTACCTCAATACACTTATATATATGCACGTAGCATATATAT~GTGTAT 1620 TTTCATTTG*T*CTTAA*T*AT*TTG*T*TCT*TCC*TTTTCTTTC*GCT*TCTCTGC*TA 1602 ATATA---------CTACCTCAACAGTGTGTGTTTTC------------------------1646 CT*ACAACAACATT***********CAAAAC*AC*TCAACTTAACTACCTAC~CA~TGATTG 1662 ----TTGATTG-CTTTTCAAGGAGTATACTACTA 1675 CTCCt******T"***tT*G****C***C********* 1696
FIG. 2. Nucleotide sequences of the Prospect Hill virus (PHV) and Hantaan virus (strain 76-118) S RNA segments, shown as the viral complementary DNA (5’-3’). The Hantaan virus S RNA sequence was obtained from Schmabohn et a/. (1986). Nucleotides are numbered on the left and right ends of each line. Nucleotides in the Hantaan virus sequence which differ from those in the PHV sequence are listed below the PHV sequence. Missing nucleotides are represented by a dash (--). The initiation codon (ATG) and termination codon (‘TAG) of the putative PHV N protein are indicated with a thick overline. The initiation codon (ATG) and termination codon (TGA) of the small open reading frame are indicated with a thin overfine.
likely represented the 3’ end of the PHV S RNA segment. The 5’ end of the S RNA was also cloned using the gene amplification method (see Materials and
Methods) with primers MP13 and MPI 4 (Fig. 1). The 242-bp amplified product (Fig. 1B, lane 1) was ligated into the Pstl site of pUCl9 and designated PS5. Nucle-
S RNA SEQUENCE
P
1,6
3~-A”CA”CA”C”GA--AGCA”““c”cG~
llllllII III 5’-UAGUAGUAUACU 16b7
OF PROSPECT
Features of the PHV S RNA segment
2p
III III III UUG-AAA-AGC\ CC_ l&3
171
HILL VIRUS
262
FIG. 3. Predicted secondary structure of the complementary 3’and 5’ termini of the PHV S RNA segment. The terminal 23 nucleotides of the S RNA are shown. The free energy value for this structure was calculated using the RNA folding program in the PC/Gene computer program (Version 5.11).
otide sequencing of PS5 demonstrated clone P275 was missing only 20 nucleotides from the 5’ terminus of the PHV S RNA segment. The nucleotide sequence of clones PS3 and PS5 were determined by the dideoxy chain termination method (Sanger eta/., 1977). The combined and consensus sequences (1675 bases) of PS3, P275, and PS5 are shown in Fig. 2. Comparison of the 3’ and 5’ termini of the consensus sequence in the vRNA sense revealed a complementary sequence involving 20 of the terminal 23 nucleotides (Fig. 3). The base-paired structure that could result had a calculated free energy of -15 kcal/mol. Complementarity between the terminal nucleotides of a RNA segment is characteristic of Bunyaviridae (Parker and Hewlett, 1981; Bishop eta/., 1982; Cabridilla e? a/., 1983; lhara et a/., 1984; Eshita and Bishop, 1984; Colett el al., 1985; Schmaljohn et a/., 1986, 1987). Therefore, we concluded that clones PS3, P275, and PS5 represented the entire PHV S RNA segment. Base composition of the PHV S RNA was calculated as 25.9%A,20.9%G,21.7%C,and31.5%U.
A comparison was made between the nucleotide sequences determined for the PHV S segment and published sequence of the Hantaan virus (strain 76-l 18) S RNA segment (Fig. 2). At 1675 bases, the PHV S RNA is 21 nucleotides shorter than the Hantaan virus S RNA. Using a best-fit comparison, there is approximately 57% sequence homology between the two S RNAs. Stretches of homologous sequence were short, less than 15 nucleotides in length. Positions of translation termination codons were determined for the six potential reading frames of the PHV S RNA sequence. A large open reading frame (ORF) was observed in frame 1 of the S cRNA (Fig. 4). The first potential in-frame methionine initiation codon is at nucleotide position 43 from the 5’terminus of the cRNA (Fig. 2). The flanking sequence for this initiation codon is not considered optimum for translation initiation since there is a G in the -3 position and an A in the +4 position (Kozak, 1984, 1986a,b). However, there are no other in-frame methionine initiation codons until nucleotide position 261. This ORF extends to a termination codon (TAG) at nucleotide position 1342 and could encode a 433 amino acid N protein with a molecular weight of approximately 49K. These values are very similar to those reported for the Hantaan virus N protein (429 amino acids, and 48K) (Schmaljohn et a/., 1986). A second, smaller, overlapping ORF was observed in frame 2 of the PHV S cRNA (Fig. 4) giving a potential in-frame methionine initiation codon at nucleotide position 83 (Fig. 2). With a G at the -3 position and an A
ATG
Cl
Y’
11
ill
111
ATG 11
c2
11
1
1
11
lllll
1
111
11 ll lllllllll
1
,
c3 Vl v2 v3
I I”.
1
1 “‘.
1
11
“I.
““.
1
1
‘.
. I”‘.
500
1000
1
1 ”
1
1 ”
11, I
”
1500
Nucleotides FIG. 4. Translation termination codon locations in all reading frames of the PHV S RNA segment. Translation termination codons in the PHV S RNA are indicated by arrows in the three reading frames of the viral complementary sense (C) and viral sense (V). The first potential ATG initiation codon in each of the two longest open reading frames is also indicated by arrows.
172
PARRINGTON
AND KANG
PHV - MSQLRKIQEE ITRHEQQLVI ARQKLKEAER TVEVDPDDVN KSTLQSRRSA VSTLEDKLAE 60 I*1 I * I*I I I II III1 IIII***II* Ill I **** 76-118 - MATMEELQRE INAHEGQLVI ARQKVRDAEK QYh%iI KRTLTDREGV AVSIQAKIDE 60 61 FKRQIADVIS
IIIIII
I
RQKMDEKPVD PTGIELDDHL KERSSLQYGN VLDVNSIDIE
61 LKRQLADRIA 121 KIGSYIIEFA
I
I*
I
I
III IIII TGKNLGKEQD PTGVEPGDHL LPIILKALHM
I**II*IIII
121 SIIVYLTSFV
VPILLKALYM
I Ill*1
I III Ill*1
*II*
EPSGQTADWL 120
ll*lIIIIII
KERSMLSYGN VLDLNHLDID EPTGQTADWL 120
LSTRGRQTVK ENKGTRIRFK DDSSYEDVNG IRRPKHLYVS 180
I l*IlIIll
I *IIIIIIIII
IIll*lIIII
ll*lIIIlII
LTTRGRQTTK DNKGTRIRFK DDSSFEDVNG IRKPKHLYVS 180
181 MPTAQSTMKA EELTPGRFRT IVCGLFPAQI MARNIISPVM GVIGFAFFVK DWADKVKAFL 240
*I lll*lII
ll*lIll*lI
IIll*lIII
III1
181 LPNAQSSMKA EEITPGRYRT AVCGLYPAQI 241 DQKCPFLKAE PRPGQPAGEA EFLSSIRAYL
I
*I
I
241 IEPCKLLPDT AAVSLLGG--
* I II
--PATNRDYL
*lIII**
SVIGFLALAK
I
DWSDRIEQWL 240
MNRQAVLDETH LPDIDALVE LJLASGDPTLP 300
I*
*I I**I
I***
RQRQVALGNME TKESKAIRQ HAEAAGCSMI 296
301 DSLENPHAAW VFACAPDRCP PTCIYIAGMA ELGAFFAILQD MRNTIMASK TVGTAEEKLK 360
**I
I
I III
IIIIII
III**Ill*l
IIIIII
IIII
IIIIlIIII
IIII
IIll*
297 EDIESPSSIW VFAGAPDRCP PTCLFIAGIA
ELGAFFSILQD MRNTIMASK TVGTSEEKLR 356
361 KKSAFYQSYL RRTQSMGIQL DQRIILMYMI
EWGNEWNHFH LGDDMDPEL RQLAQALIDQ 420
III
IIIIII
IIIIIIIIII
IIII***l*
II I*I
II IlIIlIIIl
I III
III
357 KKSSFYQSYL RRTQSMGIQL GQRIIVLFMV AWGKEAVDNFH LGDDMDPEL RTLAQSLIDV 416 421 KVKEISNQEP LKI 433 417 !&kltk&lb
&
429
FIG. 5. Predicted amino acid sequence of the PHV and Hantaan virus N proteins. The PHV N protein amino acid sequence was determined from the analysis of cDNA clones representing the S RNA segment. The Hantaan virus N protein sequence was obtained from Schmaljohn et a/. (1986). Amino acids are numbered on the left and right ends of each line. Homologous amino acids in the PHV and Hantaan virus (strain 761 18) N proteins are indicated by vertical lines (j) between their amino acid sequences. A conservative amino acid change is indicated by an asterisk (*) between the amino acid sequences. Missing amino acids are indicated by a dash (-).
at the +4 position this flanking sequence is also not considered optimum for translation initiation (Kozak, 1984, 1986a,b). However, this is the same sequence flanking the initiation codon of the PHV N protein. With a termination codon (TGA) at nucleotide position 253, this ORF could encode a 90 amino acid protein with an approximate molecular weight of 10.5K. An ORF of comparable size was not observed in the Hantaan virus S RNA sequence (Schmaljohn eSal., 1986). No other reading frames had ORFs capable of encoding proteins with more than 50 amino acids (Fig. 4). Comparison of the PHV and Hantaan virus N proteins
A best-fit comparison was made between the predicted amino acid sequence of the N protein of PHV and Hantaan virus (strain 76-l 18) (Fig. 5). There is approximately 62% overall amino acid sequence homology between the two proteins. The PHV N protein is
slightly larger, having four additional amino acids. Of the 267 homologous amino acids, 130 were encoded by the same codon while another 120 used a codon differing only at the third position. There were 166 differences in the amino acid sequence but 58 were considered conservative substitutions (R-K; S-T; D-E; Q-N; V-L-l-M; A-G; A-V; Y-F). On the basis of comparison of the PHV and Hantaan N proteins it appears that the N protein has two conserved regions and one nonconserved region. The first conserved region includes the first 225 amino acids measuring from the Nterminus of each N protein. In this region, 147 amino acids were identical in type and position between the two N proteins. Of the 78 amino acid changes, 45 were nonconservative. Therefore, 80% of the amino acids in this region were identical or were conservative substitutions. The second conserved region consists of the last 124 amino acids. Within this region, 101 of the amino acids were identical in type and position. Con-
S RNA SEQUENCE 30 0
z
OF PROSPECT
173
HILL VIRUS
PHV CHD -
20-
2 $
lD-
5 f
O
.
p E
-lO-
0” -2o6
AD
x
2.
-30
)
1”~~~“~‘1’~~‘~~‘~‘1’~“~~‘~‘1”‘~“1”1”’ 160 240 60
Amino
320
400
Acids
co g
lo-
$ f . .. p c
O... -lO-
8 -2o0’ >, z -30
AD
,,,,,,I,,
,,,,,,,,, 60
,,,1,,,,,
,,,,,,(,, 240
160
Amino
,, 320
,,,,
,
,,,
400
Acids
FIG. 6. Hydrophobicity and hydrophilicity plots of the PHV and Hantaan virus N proteins were generated using the SOAP program in the PC/ Gene computer program (Version 5.1 1) and an interval of seven amino acids. A common hydrophobic domain (CHD) is indicated in the plots by an overline. A similar antigenic domain is shown in the plots with an underline.
servative amino acid changes accounted for 10 of the differences in this region between the two virus N proteins. Consequently, only 10% of this region contained nonconseNative amino acid changes. The two longest uninterrupted stretches of homologous sequence, consisting of 17 and 16 amino acids, are within this region. The 84 amino acid region between the two conserved regions was not as highly conserved and is the location of the 4 additional amino acids in the PHV N protein. Approximately 40% of the amino acids differences between the two N proteins are clustered within this region. Of the 66 changes in amino acid sequence in this region, only 15 resulted from conservative substitutions. Another difference between the two N proteins was their theoretical isoelectric points. The theoretical isoelectric points of the PHV and Hantaan virus N proteins are 5.53 and 6.70, respectively. We compared hydropathy plots of the PHV and Hantaan virus N proteins (Fig. 6). As expected, areas of relatively conserved amino acid sequence have similar profiles, while less conserved areas are dissimilar. Both N proteins have a 20 amino acid hydrophobic domain located 1 14 amino acids from the C-terminus. We
designated this as a conserved hydrophobic domain (CHD), since 16 of the 20 amino acids are identical and only one substitution was nonconservative. A hydrophilic peak common to the PHV and Hantaan virus N proteins was predicted by the PC/Gene program to contain a good antigenic domain (AD). This may be a cross-reacting antigenic domain, since 5 of 6 amino acids composing the epitope are identical, and the single substitution was conservative. DISCUSSION PHV and Hantaan virus (strain 76-l 18) are antigenitally related viruses isolated from two different hosts from two different continents (Lee er al., 1978, 1982, 1985). There was 57% nucleotide sequence homology between the two virus S RNA segments although these viruses were apparently evolving independently. The amino acid sequence of the PHV and Hantaan virus N proteins was even more highly conserved. This degree of conservation of the Hantavirus N proteins suggests that they share large functionally important domains. However, the function of these conserved regions remains to be determined. The extreme conservation of
174
PARRINGTON
the last 124 amino acids at the C-terminus may indicate this region contains a functionally important region of the N protein. Bunyaviruses use overlapping open reading frames to encode their N and NS, proteins (Bishop, 1985). Hantaan virus, however, appears to encode only the N protein with its S RNA segment (Schmaljohn et a/., 1986). Therefore, the presence of a second, shorter, overlapping open reading frame in the PHV S RNA was unexpected. However, the size of protein this open reading frame could potentially encode (90 amino acids, 10.5K) is very similar in size to the NS, proteins of La Crosse, snowshoe hare (92 amino acids, 10.4K), and Aino (91 amino acids, 10.5K) bunyaviruses (Bishop er a/., 1982; Akashi and Bishop, 1983; Akashi et a/., 1984). However, at this time there is no evidence that PHV or any other Hantavirus encodes such a protein. We are presently attempting to determine if PHV expresses a NS, protein utilizing this open reading frame. The 3’ and 5’ ends of the PHV S RNA were cloned using gene amplification with Taq DNA polymerase. The error frequency reported for this enzyme during a 30-cycle amplification is 0.25% (Saiki et al., 1988). However, others in our laboratory used this technique to clone a human parainfluenza virus 3 gene of known sequence and found an error frequency of only 0.03% over a 30-cycle amplification (Murphy et a/., virus Research, in press). In both studies all of the errors were base substitutions; no additions or deletions were observed. Clone PS5 had only 12 nucleotides not represented by clone P275 or the primer MP14. There were no errors in the overlapping portion of PS5 and therefore the chance of an error in one of the unknown 12 nucleotides is remote. Clone PS3, however, had 475 nucleotides not represented by the primer MPll or overlapping clone P275. Due to the error rates reported for Taq DNA polymerase this clone could have between 0.14 and 1.19 misincorporations. In conclusion, our data indicate that although there was nucleotide sequence divergence between the Hantaan virus and PHV S RNA segments their N proteins are highly conserved. ACKNOWLEDGMENTS We thank Dr. K. E. Wright, Dr. E. G. Brown, Dr. H. C. Birnboim, and Dr. K. Dimock for their constructive review of this manuscript. We also thank N. Delcellier for preparing oligonucleotide primers and B. Mah for operation of the automated DNA sequencer. This study is supported by grants from the Natural Sciences and Engineering Research Council of Canada, and the URIF program from the Ontario Ministry of Colleges and Universities.
REFERENCES AKASHI, H., and BISHOP, D. H. L. (1983). Comparison of the sequences and coding of La Crosse and snowshoe hare bunyavirus S RNA species. J. Viral. 45, 51-63.
AND KANG AKASHI. H., GAY, M.. IHARA,T., and BISHOP, D. H. L. (1984). Localized conserved regions of the S RNA gene products of bunyaviruses are revealed by sequence analyses of the Simbu serogroup Aino virus. Virus Res. 1, 51-63. BISHOP, D. H. L. (1985). The genetic basis for describing viruses as species. Intervirology 24,79-93. BISHOP, D. H. L., GAY, M. E., and MATSUOKO, Y. (1983). Non-viral heterogenous sequences are present at the 5’ends of one species of snowshoe hare bunyavirus S complementaty RNA. Nucleic Acids Res. 11,6409-6418. BISHOP, D. H. L., GOULD, K. G., AKASHI, H., and CLEW-VAN HAASTER, C. M. (1982). The complete nucleotide sequence and coding content of snowshoe hare bunyavirus small (S) viral RNA species. NucleicAcids Res. 10, 3703-3713. CABRIDILLA, C. D., JR., HOLLOWAY, B. D., and OBIJESKI,J. F. (1983). Molecular cloning and sequencing of the La Crosse virus S RNA. Virology 128,463-468. CHIRGWIN, J. M.. PRZYBYLA,A. E., MACDONALD, R. J., and RUITER, W. J. (1979). Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochemistry 18, 5294-5299. COLLET~, M. S., PURCHIO, A. F., KEEGAN, K., FRAZIER, S., HAYS, W., ANDERSON, D. K., PARKER,M. D., SCHMALJOHN,C., SCHMIDT, J., and DALRYMPLE,J. M. (1985). Complete nucleotide sequence of the M RNA segment of Rift Valley fever virus. Virology 144,228-245. ESHITA, Y., and BISHOP, D. H. L. (1984). The complete sequence of the M RNA of snowshoe hare bunyavirus reveals the presence of internal hydrophobic domains in the viral glycoprotein. Virology 137,227-240. GUBLER, U., and HOFFMAN, B. J. (1983). A simple and very efficient method for generating cDNA libraries. Gene 25,269-273. HOPP, T. P., and WOODS, K. R. (1981). Prediction of protein antigenic determinants from amino acid sequences. froc. Nat/. Acad. Sci. USA 78,3824-3828. IHARA, T.. AKASHI, H., and BISHOP, D. H. L. (1984). Novel coding strategy of ambisensed genomic RNA revealed by sequence analysis of Punta Toro phlebovirus S RNA. Virology 136, 293-306. JACOBSON, A. B.. GOOD, L., SIMONETTI. J., and ZUCKER, M. (1984). Some simple computational methods to improve the folding of large RNAs. Nucleic Acids Res. 12,45-66. KOZAK, M. (1984). Point mutations close to the AUG initiator codon affect the efficiency of translation of rat preproinsulin in vivo. Nature (London) 308,241-246. KOZAK, M. (1986a). Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosome% Cell44,283-292. KOZAK, M. (1986b). Regulation of protein synthesis in virus-infected animal cells. Adv. Virus Res. 31, 229-292. KYTE,J., and DOOLITTLE, R. F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105-l 32. LEE, H. W., LEE, P. W., and JOHNSON, K. M. (1978). Isolation of the etiologic agent of Korean hemorrhagic fever. J. Infect. Dis. 137, 298-308. LEE, P. W., AMYX, H. L., GAIDUSEK, D. C., YANAGIHARA, R. T., GOLDGABER,D.. and GIBBS, C. J., JR. (1982). New hemorrhagic fever with renal syndrome-related virus in indigenous wild rodents in the United States. Lancer 2, 1405. LEE, P. W., AMYX, H. L.. YANAGIHARA,R., GAJDUSEK,D. C.. GOLDGABER, D., and GIBBS, C. J., JR. (1985). Partial characterization of Prospect Hill virus isolated from meadow voles in the United States. 1. Infect. Dis. 152,826-829. MESSING, J. (1983). New Ml3 vectors for cloning. In “Methods in Enzymology,” (R. Wu, L. Grossman, and K. Moldave. Eds.). pp. 125-l 45. Academic Press. Orlando, FL.
S RNA SEQUENCE
OF PROSPECT
PARKER,M. D., and HEWLETT,M. J. (1981). The 3’terminal nucleotide sequences of Uukuniemi and lnkoo virus RNA segments. In “Replication of Negative Strand Viruses,” pp. 125-l 45. Elsevier. New York. SAIKI, R. K., GELFAND, D. H., STOFFEL, S., SCHARF, S. J., HIGUCHI, R., HORN, G. T., MULLIS, K. B., and ERLICH, H. A. (1988). Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239,487-491. SANGER,F.. NICKLEN, S., and COULSON, A. R. (1977). DNAsequencing with chain terminating inhibitors. froc. /Va?/. Acad. Sci. USA 74, 5463-5467. SCHMALJOHN,C. S., and DALRYMPLE,J. M. (1983). Analysis of Hantaan virus RNA: Evidence for a new genus of Bunyaviridae. Virology 131,482-491. SCHMALJOHN,C. S., HASTY, S. E., DALRYMPLE.J. M., LEOUC,J. W., LEE, H. W., VON BONSDORFF, C. H., BRUMMER-KORVENKONTIO,M., VAHERI. A., TSAI, T. F., REGNERY. H. L., GOLDGABER, D., and LEE, P. W. (1985). Antigenic and genetic properties of viruses linked to hemorrhagic fever with renal syndrome. Science 227, 1041-l 044.
HILL VlRUS
175
SCHMALJOHN, C. S., JENNINGS,G. B., HAY, J., and DALRYMPLE. J. M. (1986). Coding strategy of the S genome segment of Hantaan virus. Virology 155,633-643. SCHMALJOHN.C. S., SCHMAUOHN, A. L., and DALRYMPLE,J. M. (1987). Hantaan virus M RNA: Coding strategy, nucleotide sequence, and gene order. Virology 157, 3 l-39. YANAGIHARA, R., DAUM, C. A.. LEE, P. W., BAEK, L. J., AMYX. H. L., GAIDUSEK, D. C., and Glees, C. J., JR. (1987). Serological survey of Prospect Hill virus infection in indigenous wild rodents in the USA. Trans. R. Sot. Trop. Med. Hyg. 81,42-45. YANAGIHARA, R., GAJDUSEK, D. C., GIBBS. C. J., JR., and TRAUB, R. (1984). Prospect Hill virus: Serological evidence for infection in mammologists. N. Engl. J. Med. 310, 1325-l 326. Yoo, D., and KANG, C. Y. (1987). Genomic comparisons among members of Hantavirus group. In “The Biology of Negative Strand Viruses” (B. W. J. Mahy and D. Kolakofsky, Eds.). pp. 424-431. Elsevier, New York. ZUCKER. M., and STIEGLER, P. (1981). Optimal computer folding of large RNA sequences using thermodynamics and auxillaty information. Nucleic Acids Res. 9, 133-l 48.