VIROLOGY
164, 487-497
Measles
(1988)
Virus L Protein
Evidences
Elements
of Ancestral
RNA Polymerase
BENJAMIN M. BLUMBERG,’ JOAN C. CROWLEY, JOEL I. SILVERMAN, JOSEPH MENONNA, STUART D. COOK, AND PETER C. DOWLING Neurology Service, Neurovirology Laboratory, East Orange VA Medical Center, East Orange.New Jersey 07019 and De.oartment of Neurosciences MBS H506, UMDNJ-New Jersey Medical School, 185 South Orange Ave., Newark. New Jersey 07103 Received November
5, 1987; accepted
December
30, 1987
We have determined the nucleotide sequence of the measles virus (MV) L gene using a cDNA library encompassing the entire MV genome (J. Crowley et a/. (1987) lnrervirology, 28, 65-77). The L gene is 6639 nucleotides in length, and contains a single long open reading frame that could code for a protein of 247,611 kDa. Both the L gene and in particular the predicted L protein of MV bear substantial homology to their counterparts in Sendai virus and Newcastle disease virus, suggesting that the multifunctional nature of paramyxovirus L proteins imposes strong evolutionary constraints. The predicted MV L protein also contains distinct elements of a postulated ancestral RNA polymerasa. 8 1999 Academic
Press,
Inc.
INTRODUCTION Measles virus (MV), a member of the morbilliform subgroup of paramyxoviruses (Kingsbury et al., 1978), is an important human pathogen (Morgan and Rapp, 1977). Most unvaccinated children experience MV infection, with can entail much morbidity and high mortality (Wild, 1986), and epidemic outbreaks occur rarely even in fully vaccinated populations (Gustafson et a/., 1987). In order to provide a rationale for disease pathogenesis and future vaccine development, a thorough understanding of MV at the molecular level is required. MV contains over 15,000 nucleotides of genetic information in a single RNA of antimessage (-) polarity, tightly bound in a helical nucleocapsid structure (Choppin and Compans, 1975). The nucleocapsid is the obligatory template for viral transcription and replication; unencapsidated viral genomic RNA is transcriptionally inactive and not infectious. Three structural proteins associated with the viral nucleocapsid embody a nucleocapsid-specific RNA polymerase activity containing all enzymatic activities required for viral mRNA synthesis, i.e., capping, methylation, and polyadenylation. The large protein (L, >200 kDa) and the phosphoprotein (P, -70 kDa) are associated with the viral nucleocapsid, and function in catalytic amounts as the viral RNA polymerase. The nucleocapsid protein (N, -60 kDa) is required in stoichiometric amounts for encapsidation, and confers on the nu-
To whom requests for reprints should be addressed at UMDNJ-New jersey Medical School, 185 South Orange Ave., Newark, NJ 07103.
cleocapsid its flexible helical conformation (Heggeness et a/., 198 1). N also confers on the genomic RNA its template activity; in this respect N is functionally a part of the viral polymerase. The MV gene order is 3’-N,P/C,M,F,H,L-5’ (Dowling er al., 1986; Rima et a/., 1986; Yoshikawa et a/., 1986). The viral RNA polymerase is thought to engage the genome at a unique 3’ end promoter, and in its passage toward the 5’ end undergoes attenuation at each gene boundary, thus giving rise to a polarity gradtent of transcription, with the N mRNA being the most abundant (Kingsbury, 1974; lverson and Rose, 1981; Cattaneo et al., 1987). Based on these considerations, and on their simtlar genome organization, RNA synthesis in all paramyxoviruses is thought to be a complex process requiring specific interactions between at least the L, P, and N proteins and sequences in the genome RNA. Because of its large size, the majority of the activities involved are suspected to reside in functional domains of the L proteins, analogously to the L protein of vesicular stomatitis virus (VSV) (Schubert et al., 1984, 1985). The large size of the MV L mRNA, and its relative scarcity due to the polarity gradient, have hindered full-length cDNA cloning of this transcript. Here we report the determination of the nuclcotide seauence of the MV L gene from a genomic library containing overlapping clones encompassing the entire MV genome (Crowley et al., 1987). In a separate communication, we also report the sequence ot the genome 5’ end, thus completing the sequence determination of the MV genome. Comparison of the predicted primary structure of the MV L protein with those of Sendai virus (Shioda er al., 1986; Morgan and Rakestraw, 1986) and NDV (Yusoff ef al., 1987) shows a very high degree of ho-
488
BLUMBERG ET AL.
mology among these paramyxoviruses. In addition, “GDD” sequences, proposed to be characteristic of RNA-dependent RNA polymerases (Kamer and Argos, 1984), are present in the predicted MV L protein, suggesting that paramyxoviruses also may have evolved from a putative common ancestor.
MV nucleocapsid RNA was reverse-transcribed by random priming and inserted into the Pstl site of pBR322 by G/C tailing, as previously described (Dowling et a/., 1983; Crowley et al., 1987). Out of 6000 clones obtained, 608 containing the longest MV-specific inserts were selected by colony hybridization and organized by reference to a previously characterized MV library (Dowling et al,, 1986). Clones containing L gene sequences were identified by Northern blotting, and their exact positions relative to the MV genome were determined by a combination of colony hybridization, restriction endonuclease and Sl nuclease mapping, and primer extension experiments (Crowley et a/., 1987). Fifteen of these clones contained inserts of >lOOO bp, which greatly aided their mapping. Out of the 608 selected clones, over 200 clones spanning the L gene region were mapped, in proportion with the size of the L gene relative to the MV genome.
Because of the presence of G/C tails in our clones, we chose as our primary tool the relatively new technique of supercoil plasmid sequencing by the dideoxy chain-termination method (Chen and Seeburg, 1985; Korneluk et al., 1985). In this technique, the plasmid DNA is alkali-denatured and then neutralized, leaving extensive looped out single-strand regions. The use of commerical synthetic oligonucleotide primers specific for the Estl site of pBR322 (Wallace et al., 1981) and AMV reverse transcriptase (Life Sciences) enabled us to sequence through the G/C tails and to obtain sequences of >300 nucleotides from both ends of an insert. We followed a simple protocol (Pharmacia Technical Bulletin) that is rapid and well-suited for sequencing clones with insert sizes of 400-600 bp. When applied to clones with longer inserts, however, this technique often left gaps in the sequence near the middle of the insert. We found that this problem was best overcome by using the remains of dideoxy reaction mixtures as probes for colony hybridization, to locate among our series of overlapping clones candidates with smaller inserts, or inserts with ends near the gap. Overlapping clones identified in this manner also proved helpful in resolving compression and other ambiguities due to secondary structure inherent in a particular insert. For complete sequencing of clones with large inserts, and to clarify any remaining ambiguities, we supplemented this procedure with Maxam-Gilbert (1980) chemical sequencing from appropriate internal restriction and end-labeling sites. Some of these sites were also used in conjunction with Sl nuclease mapping for determining the boundaries of the MV L gene (see Fig. 2 and also Crowley et a/., 1987). Figure 1 presents a map of the MV L gene, showing the clones and restriction sites used for sequencing and Sl mapping.
Sequencing of MV L gene clones
Sl nuclease mapping
Our sequencing strategy was dictated by the possession of an organized library of overlapping L clones, and all sequence analysis was performed on plasmid DNA. Every nucleotide reported was sequenced in at least two clones, and more than 90% of the sequence was determined in four or more clones. Altogether, the sequence reported is based on the determination of over 40,000 nucleotides. Some sequence variation between isomorphous clones was expected (Schubert et al., 1984). In cases where corresponding sequences did not coincide exactly, or where there was sequence ambiguity, we exploited the depth of our library by sequencing additional clones until a concensus sequence became apparent.
The position of the MV L mRNA 3’ end was determined as in previously published Sl nuclease mapping studies of MV intercistronic regions including the H-L boundary (Crowley et al., 1987), based on techniques developed for mapping Sendai virus genes (Giorgi et al,, 1983; Blumberg et a/., 1984b; 1985a, b). Samples were electrophoresed on 8% sequencing gels.
MATERIALS AND METHODS Preparation of virus and extraction of MV RNAs Our laboratory standard is the Udem strain of Edmonston vaccine MV adapted to HeLa cells in spinner culture (Udem and Cook, 1984; Udem, 1984). RNAs are extracted by previously reported methods (Leppert et a/., 1979; Dowling et al., 1983; Crowley et a/,, 1987).
Molecular cloning of MV genomic RNA and characterization of L gene clones
Computer analysis of the MV L gene and protein and their homologies The Pustell DNA and protein sequence analysis programs were purchased from IBI and used on an IBM PC/AT computer.
MEASLES VIRUS L GENE SEQUENCE
RESULTS MVLgenesequence The nucleotide sequence of the MV L gene is presented in Fig. 2 as mRNA (+) sense DNA. The L gene begins with AGGGTCCAAGTGG, in accord with the H-L intercistronic sequence reported by Cattaneo et a/. (1987) and our Sl mapping data (Crowley et a/., 1987). The first AUG occurs only 23 nucleotides from the start of the L mRNA, and is in a favored context (CCGxx@G) to serve as a eukaryotic initiator codon, according to Kozak (1984, 1986). This AUG begins a long open reading frame (ORF) that terminates with a UAA codon at position 6572, backed up at position 6614 by a second in-frame UAG termination codon. This long ORF contains 2183 codons, and could code for a protein of 247,61 1 kDa. In a separate communication, we report the determination of the final intercistronic and 5’ end region of the MV genome. We show below that the L gene terminates with the sequence AATATATTAAAGAAAA, a variant of the canonical MV termination signal (Cattaneo et a/., 1987). Excluding the final four A’s, which form part of the poly(A) tail of the L mRNA, the MV L gene is 6639 nucleotides long and contains only 90 untranslated nucleotides. Sl nuclease mapping of the L mRNA 3’ end The DNA sequence of the insert of clone 30 appeared to contain the canonical MV termination se-
489
quence, as well as a unique I%nfl site near one end of the insert, at position 6444 of the L gene. We therefore used 3’ end-labeled Hinfl restriction fragments of this clone to map the 3’end of the L mRNA (Fig. 3). In these experiments, digestion of some of the labeled material with Pstl gives a band on denaturing gel electrophoresis that defines the insert length; thus, a band shorter than the distance from the labeling site to the G/C tail represents the end of an mRNA. in the left panel, a band at about 190 bp (arrow) appeared when the 3’labeled DNA was annealed with CsCl pellet RNA from MV-infected cells and Sl nuclease treated (lanes 2 and 3). This band was strongest when a large quantity of RNA was used (lane 3) and was absent when the labeled digest was annealed with mock-infected RNA or with MV genomic RNA (lanes 4 and 5). Note that the Hinfl digest pattern is visible in some lanes due to overloading. The Pstl digest of the labeled DNA (lane 1) also revealed the 240-bp insert band (dot) just below the 246-bp vector band. This band was cut out of the gel for sequencing. The middle panel of Fig. 3 shows a close-up of the Sl experiment, along with the Maxam-Gilbert sequence of the 240-bp Hinfl band. The T+C reaction unfortunately worked poorly; however, the pattern is distinctive, and the termination sequence can be clearly read from the autoradiogram of the different sequencing gel (right panel). In the closeup view, the Sl bands (arrows) are spread throughout the TTCTlTT termination sequence, most likely due to nibbling by the nuclease. Coding potential of the L gene
2.0
1.0 I
l9.2101 I
3.ib
H
L
5.0
4.0
6.0
6.639
ORF
d’
I tkdlll
sslll Ed HndlY
Xmal %dXW
190
36
& 3&
148
83 483
Br/U
M24
552 25 70 356 & J&g& 536
s1+ 104
297 171
292-
Awl
Boll
Hw
%I I
s
424
XM XbalN.4
Ad
-&c&s
,Jg&
Bj1
191 - 406 -549 Me0
SiL NC3792 276 -ii%% 119
100
FIG. 1. Map of the MV L gene. Top to bottom: Scale in kilobases; open reading frame; restriction sites used for Maxam-Gilbert sequencing and Sl nuclease mapping; results of Sl nuclease experiment to determine the L mRNA 3’ end (see Fig. 3); clones used for sequencing: bars indicate the MV gene region represented by the inserts. Based on previously published data for the MV gene and intercistronic sequences (Rozenblatt et a/., 1985; Bellini et a/., 1985, 1986; Alkhatib and Briedis, 1986; Richardson et a/., 1986; Cattaneo el al., 1987) and on our own sequence data (Crowley et a/., 1988), the L gene starts at position 92 10 of the genome.
The codon usage of the MV L gene is similar to that of VSV (Schubert et al., 1984) Sendai (Shioda et al., 1986; Morgan and Rakestraw, 1986), and NDV (Yusoff et al., 1987), and shows the expected bias against CpG dinucleotides characteristic of many mammalian mRNAs (Grantham, 1978; Bird, 1960). The nucleotide composition (29.4% A, 21.3% C, 22.7% G, 26.6%T) is also similar to that of VSV, Sendai, and NDV. Figure 4 demonstrates the coding regions of the L gene according to the IBl/Pusteli program (Pustell and Kafatos, 1984) with the termination codons in the six possible frames shown as T’s in ,the narrow left and right panels. Note that only frame 2 (second from the left) is open full-length. The center panel shows the coding potential of the six L gene reading frames. It is interesting to note that, according to this algorithm which was intended to distinguish exons from introns in eukaryotic DNA, frame 2 which contains the putative L ORF is not always strongiy favored over all other potential read,ing frames. Frames 3 and 4 particularly are also predicted to have high coding potential in
BLUMBERG
490
ET AL.
A
10 40 20 30 50 60 70 80 90 AGGGTCCAAGTGGTTCCCCGTTATGGACTCGCTATCTGTCAACCAGATCTTATACCCTGAAGTTCACCTAGATAGCCCGATAGTTACCAATAA~ATAGTAGCCATCCTGGAGTATGCTCG PlDSLSUNQI LYPE’JHLDSPIUTNKI 130
140
150
160
170
180
190
200
100
110
120
230
240
340
350
360
460
470
UAILEYAR
210
220
AGTTCCTCACGCTTACAGCCTGGAGGACCCTACACTGTGTCAGAACATCAAGCACCGCCTAAAAAACGGATTTTCCAACCAAATGATTATAAACAATGTGG~GTTGGGAATGTCATCAA UPHAYSLEDPTLCQNIKHRLKNGFSNQMII NNUE’JGNUIK
220
260
270
280
290
300
310
320
330
GTCCAAGCTTAGGAGTTATCCGGCCCACT~TCATATTCCATATCCAAATTGTAATCAGGATTTATTTAACATAGAAGACAAAGAGTCAACGAGGAAGATCCGTGAACTCCTCAAAAAGGG SKL’RSYPAHSHIPYPNCNQDLFNIEDKESTRKIRELLKKG
370
380
390
400
410
420
430
440
450
GAATTCGCTGTACTCCAAAGTCA~TGAlAAGGTTTTCCAATGCTTAAGGGACACTAACTCACGGCTTGG~CTAGGClCCGAATTGAGGGAGG.AcATCAAGGAGAAAGTTATTAACTTGGG NSLYSKUSDKUFQCLRDTNSRLGLG.SELREDIKEKU
490
500
$10
520
530
540
550
960
480 I
570
N
L
G
580
590
600
700
710
720
820
830
840
950
960
AGTTTACATGCACAtCTCCCAGTGGTTTGAGCCCTTTCTGTTTTGGTTTACAGTCAAGAcTGA6ATGAGGTCAGTGATTAAATCAcAAACCCATACTTGCCATAGGAGGAGACACACACc UYMHSSQWFEPFLFWFTUKTEMRSUI KSQTHTCHRRRHTP
610
620
630
640
650
660
670
660
690
TGTATTCTTCACTGGTAGTTCAGTTGAGTTGCTAATCTCTC~TGACCTTGTTGCTATAATCAGTAAAGAGTCTCAACATGTATATTACCTGACATTTGAACTGGTTTTGATGtATTGTGA UFFTGSSUELL ISRDLUAI ISKESQHUYYLTFELULMYCD
730
7'40
750
760
770
780
790
800
810
TGTCATAGAGGGGAGGTTAATGACAGAGACCGCTATGACTATTGATGCTAGGTATACAGAGCTTCTAGGAAGAGTCAGATACATGTGGAAACTGATAGATGGTTTCTTCCCTGCACTCGG U IEGRLMTETAMTI IDGFFPALG DARYTELLGRURYMWKL
850
860
870
880
890
900
910
920
930
940
GAATCCAACTTATCAAATTGTAGCCATGCTGGAGCCTCTTTCACTTGCTTACCTGCAGCTGAGGGATATAACAGTAGAACTCAGAGGTGCTTTCCTTAACCACTGCTTTACTGAAATACA ITUELRGAFLNHCFTEIH N P T Y 0 IUAMLEPLSLAYLQLRD
970 980 990 1000 1010 1020 1030 1040 1050 1060 TGATGTTCTTGACCAAAACGGGTTTTCTGATGAAGGTACTTATCATGAGTTAATTGAAGCTCTAGATTACATTTTCATAACTGATGACATACATCTGACAGGGGAGATTTTCTCATTTTT DULDQNGFSDEGTYHELIEALDYIFI IHLTGEIFSFF T D D 1090
1100
1110
1120
1130
1140
1150
1160
1170
1070
1080
ii80
1190
1200
1300
1310
1320
1540
1550
1560
1660
1670
1680
1780
1790
leoo
1900
1910
1920
2030
2040
2150
2160
CAGAAGTTTCGGCCACCCCAGACTTGAAGCAGTAACGGCTGCTGAAAATGTTAGGAAATACATGAATCAGCCTAAAGTCATTGTGTATGAGACTCTGATGAAAGGTCATGCCATATTTTG IUYETLIIKGHAIFC RSFGHPRLEAUTAAENURKYHNQPKU
1210
1220
1230
1240
1250
1260
1270
1280
1290
TGGAATCATAATCAACGG~TAT~GTGACAGG~ACGGAGGCAGTTGGCCACCGCTGACCCTCCCCCTGCATGCTGCAGACACAATCCGGAATGCTCAAGCTT~AGGTGAAGGGTTAA~ACA RNAQASGEGLTH INGYRORHGGSWPPLTLPLHAADTI G I I
AGUKFGCFMPLSLDSDLTMYLKDKALAALQ
EQCUDNWKSF 1450
1460
1470
1480
1490
1500
1510
1520
1530
AAGGGAATGGGATTCAGTTTACCCGAAAG~TTCCTGCGTTACGACCCTCCCAAGGGAACCGGGTCACGGAGGCTTGTAGATGTTTTCCTTAATGATTCGAGCTTTGACCCATATGATGT REWDSUYPKEFLRYDPPKGTGSRRLUDUFLNDSSFDPYDU 1570
1580
1590
1600
1610
1620
1630
1640
1650
GATAATGTATGTTGTAAGTGGA~CTTACCTCCATGACCCTGAGTTCAACCTGTCTTACAGCCTGCAAGAAAAGGAGATCAAGGAAACAGGTAGACTTTTTGCTAAAATGACTTACAAAAT IKETGRLFAKMTYKM I N Y V II S G A Y LHDPEFNLSYSLQEKE 1690
1700
1710
1720
1730
1740
1750
1760
1770
GAGGGCATG~CAAGTGATTG~TGAAAATCTAATCTCAAACGGGATTGGCAAATATTTTAAGGACAATGGGATGGCCAAGGATGAGCAAGATTTGA~TAAGGCA~TC~A~A~TCTAGCTGT GKYFKDNGMAKDEQDLTKALHTLAU IAENLISNGI R A C Q U leio
1820
1630
1040
1850
1860
1870
lee0
1890
CTCAGGAGT~~~CAAAGAT~TCAAAGAAAGT~A~AGGGG~~GG~~AGT~TTAAAAA~~TA~T~C~GAA~~~~A~T~~ACACAAGTA~CAGGAACGT~AGAG~AGCAAAAGG~TTTATAGG SGUPKDLKESHRGGPULKTYSRSPUHTSTRNURAAKGFIG
1930 GTTCCCTCAAGTAATTC F P Q ‘J I
2050
1940
1970
1950
i9eo
1990
2000
2010
GcAGGACCAAGACACTGA~~~~cCGGAGAATATGGAAGCTTAcGAGACAGTCAGTGcATTTATCAcGACTGATcTcAA~~~~TAcTGCCTTAATTGGAGATA TTDLKKYCLNWRY F 61 D Q D T D H P E N M F. A Y E T U S A F 1
2060
2070
2080
2090
2100
2110
2120
21so
2140
TGAGACCATCAGCTTGTTTGCACAGAGGCTAAATGAGATTTACGGATTGCCCTCATTTTTCCAGTGGCTGCATAAGAGGCTTGAGACCTCTGTCCTGTATGTAAGTGACCCTCATTGCC~ SFFQWLHKRLETSULYUSDPHCP Y G L P ISLFAQRLNEI E T
2170
2180
2190
2200
2210
2220
2240
2280
CCCcGAcCTTGAcGcCCATATcCCGTTATATAAAGTcCcCAATGATCAAATcTTCATTAAGTACcc~~~~GGAGGTATAGAAGGGT~~~~TCAGAA~~~~TGGACC~~~~GCACCATTcc IPLYKUPNDQlFlKYPMGGIEGYCQKLWT!ST~F PDLDAH
FIG. 2. Sequence of the MV L gene and deduced L protein primary structure. The L gene sequence is presented as (+) DNA; the L protein is presented in standard single-letter amino acid code. GDDD and LDD homologies to postulated ancestral RNA polymerase (see text) are underlined. Asterisks denote double termination codons.
MEASLES VIRUS L GENE SEQUENCE
5
2290
2300
2310
2320
2330
2340
2350
2360
491
2370
2380
2390
2400
2500
2510
2520
2630
2640
CTATCTATACCTGGCTGCTTATGPGAGCGGAGTAAGGATTGCTTCGTTAGTGCAAGGGGACAATCAGACCATAGCCGTAACAAAAAGGGTACCCAG~ACATGG~~CTACA~CTTAAGAA IAUTKRUPSTWPYNLKK Y L. Y L A A Y E S G U R I A S L u Q G D N P T
2410
2420
2430
2440
2450
2460
2470
2400
2490
ACGGGAAGCTGCTAGAGTAACTAGA~ATTACTTTGTAATTCTTAGGCAAAGGCTACATGATATTGGCCATCACCTCAAGGCAAATGAGACAATTGTTTCATCACFITTTTTTTGTCTATTC USSHFFUYS R E A A R U T R 0 Y F U I L R Cl R L H D I G H H L K A N E T I
2530
2540
2550
2560
2570
2580
2590
2600
2610
262 B
AAAAGGAATATATTA~ATGGGCTACTTGTGTCCCAATCACTCAAGAGCATCGCAAGATGTGTATTCTGGTCAGAGACTATAGTTGATGAAACAAGGGC IARCUFWSETXUDETKAACSNIAT YYDGLLUSQSLKS K G I
2650
2660
2670
2680
2690
2700
2710
2720
GCATGCAGTAATATTGCTAC
2730
2740
2750
AACAATGGCTAAAAGCATCGAGAGAGGTTATGACCGTTACCTTGCATATTCCCTGAACTTCCTAAAAGTGATCCAGCAAATTCTGATCTCTCTTGGCTTCACAATCAATTCAACCATGAC T tl A K S IERGYDRYLAYSLNFLKU I Q 6 ILISLGFT
2770
2780
2790
2800
2810
2820
2830
2840
2850
2760
INSTfiT
2860
2870
2880
2980
2990
3000
3100
3110
3120
3230
3240
CCGGGATGTAGTCATACCCCTCCTCACAAACAACGACCTCTTAATAAGGATGGCACTGTTGCCCGCTCCTATTGGGGGGATGAATTATCTGAATATGAGCAGGCTGTTTGTCAI;AAACAT R D U ‘J I PLLTNNDLLIRMALLPAPIGGilNYLN~SRLFURNI
2890
2900
2910
2920
2930
2940
2950
2960
2970
CGGTGATCCAGTAACATCATCAATTGCTGATCTCAAGAGAATGA?TCTCGCCTCACTAATGCCTGAAGAGACCCTCCATCAGGTAATGACACAAC4A~CG~GG~ACTCTTCATTCCTAGA GDPUTSSIADLKR~IILASL~PEETLHQV~ITQQPGDS.SFLD
3010
3020
3030
3040
3050
3060
3070
3080
3090
CTGGGCTAGCGACCCTTACTCAGCAAATCTTGTATGTGTCCAGAGCATCACTAGACTCCTCAAGAACATAACTGCAAGGTTTGTCCTGATCCATAGTCCAAACCCAATGTTAAAAGGATT TARFUL IHSPNPMLKGL WASOPYSANLUCUQS ITRLLKNI
3130
3140
3150
3160
3170
3180
3190
3200
3210
3220
ATTCCATGATGACAGTAAAGAAGAGGACGAGGGACTGGCGGCATTCCTCATGGACAGGCATATTATAGTACCTAGGGCAGCTCATGAAATCCTGGATCATAGTGTCACAGGGGCAAGAGA FHDDSKEEDEGLAAFLMDRNI 1 L 0 IUPRAAHE
32513
3260
3270
3280
3290
3300
3310
3320
3330
H
s
UTGARE
3340
3350
3360
3460
3470
3480
3590
35911
3600
GTCTATTGCAGGCATGCTGGATACCACAAAAGGCTTGATTCGAGCCAGCATGAGGAAGGGGGGTTTAACCTCTCGAGTGATAACCAGATTQTCCAATTATGACTATG~CAATTCAGAGC SIAGflLDTTKGLIRAStlRKGGLTSRUITRLSNYDYEC!FRA
3370
3380
3390
3400
3410
3420
3430
3440
3450
AGGGATGGTGCTATTGACAGGAAGAAAGAGAAATGTCCTCATTGACAAAGAGTCATGTTCAGTGCAGCTGGCGAGFIGCTCTAAGAAGCCATATGTGGGCGAGGCTAGCT~GA~G~GGCC GMULLTGRKRNULIDKESCSUQLARALRSHWWARLARGRP
3490
3530
3500
3540
3550
3560
3570
TATTTACGGCCTTGAGGTCCCTGA~G~~~~AGAATC~~~~CGAGGCCACCTTATTCGGCGTCATGAGACATGTGTCATCTGCGAGTGTGGATC~GT~AACT~CGGATGGTTTTTTGTCCC C E C G S W IYGLEUPDULES~RGHLIRRHETCUI 3610 3620 3630 3640 3650 3660 3670 3680 3690 CTCGGGTTGCCAACTGGATGATATTGACAAGGAAACATCATCCTTGAGAGTCCCATATATTGGTTCTACCACTGATGAGAGAACAGACATGAAGCTTGCCTT~GT~GffiCCCCAAGTCG SGCPLDDIDKETSSLRUPYI GSTTDERTDMKLAFURAPSR
3730
3740
3750
3760
3770
3780
3790
3800
3810
N
‘I
G
W
F
F
U
P
3700
3710
3720
3820
3830
3840
3940
3950
3960
4060
4070
4080
4180
4190
4200
4310
4320
ATCCTTGCGATCTGCTGTTAGAATAGCAACAGTGTACTCATGGGCTTACGGTGATGATGATAGCTCTTGGAACGAAGCCTGGTTGTTGGCTAGGCAAAGGGCCAATGT~GCCTGGAGGA ATUYSWAYGDDDSSWNEAWLLARQRANUSLEE S L R S A U R I
3850
3860
3870
3880
3890
3900
3910
3920
3930
GCTAAGGGTGATCACTCCCATCTCAACTTCGACTAATTTAGCGCATAGGTTGAGGGATCGTAGCACTCAAGTGAAATACTCAGGTACATCCCTT~TCCGAGT~GCGAGGTATACCACAAT LRUITPI STSTNLAHRLRDRSTQUKYSGTSLURUARYTTI
3970
3980
3990
4000
4010
4020
4030
4040
4050
CTCCAACGACAATCTCTCATTTGTCATATCAGATAAGAAGGTTGATACTAACTTTATATAC~AACAAGGAATGCTTCTAGGGTTGGGTGTTTTAG~AAC~TTGTTTCGA~~TC~AGAA~A SNDNLSFU ISDKKUDTNFIYQRGflLLGLGWLE?LFRLEKD
4090
4100
4110
4120
4130
4140
4150
4160
4170
TACCGGATCATCTAACACGGTATTACATCTTCACGTCGAAACAGATTGTTGCGTGATCCCGATGATAGATCATCCCAGGATAC~CAGCTCCCGCAAGCTA~AGCTGAGGGCAGAGCTATG TGSSNTULHLHUETDCCU IPNIDHPRI PSSRKLELRAELC
4210
4220
4230
4240
4250
4260
4270
4280
4290
4300
TAC~AAC~CATTGATATATGATAATGCA~~TTTAATTGACAGAGATG~AACAAGGCTATACA~C~AGAG~CATAGGAGGCACCTTGTGGAATTTGTTACATG~TCCACA~CCCAA~TATA T N P L IYDNAPLIDRDATRLYTQSHRRHLUEFUTW
4330
4340
4350
4360
4370
4380
4390
4400
4410
STPQLY
4420
4430
4440
4550
4560
TCACATTTTAGCTAAGTCCACAGCACTATCTATGATTGACCTGGTAACAAAATTTGAGAAGGACCATATGAATGAAATTTCAGCTCTCATAGGGGATGA~G~TA~CAATAGTTTCATAAC H I L A K S T A L s n IDLUTKFEKOHMNEI s A L IGDDDINSFIT
4450
4460
4470
4480
4490
4500
4510
4520
4530
TGAGTTTCTGCTCATAGAGCCAAGATTATTCACTATCTACTTGGG~CAGTGTGCGG~CATCAATTGGGCATTTGATGTACATTAT~ATAGACCATCAGGGAA~T~TCAGATGGGTGAG~T E F L L IEPRLFT fYLGQCAAINWAFDUHYHRPSGKYQMGi?L
FIG. 2-Continued.
4540
BLUMBERG
C
4570
4500
4590
4600
4610
4620
ET AL.
4630
4640
4650
4660
GTTGTCATCGTTCCTTTCTAGAATGAGCAAAGGAGTGTGTTT~~GGTGCTTGTC~~TGCTCT~~GCC~CCC~~~G~TCT~C~~G~~~TTCTGGC~TTGTGGT~TT~T~G~GCCT~TCC~TGG LSSFLSRNSKGUFKULUN~~LSHPK IYKKFWHCGI
4690
4700
4710
4720
4730
4740
4750
4760
4770
4670
4680
IEPIHG
4780
4790
4800
4900
4910
4920
5020
5030
5040
5140
5150
5160
5260
5270
5280
TCCTT~A~TT~~TG~TC~FI~~~TTG~~C~~~~CTGTGTGC~~C~T~GTTT~~~~~TGCT~T~TG~CCT~~CTCG~CCTGTTGTTG~~T~~~~~~TT~~~~~~GTTC~C~TTT~TCTTGTG PSLDAQNLHTTUCNMUYTCYIITYLDLLLNEELEEFTFLLC
4810
4820
4830
4B40
4850
4860
4870
4080
4990
TGA~CIGCGACG~GG~TGTAGTFlCCGG~C~G~TTCG~C~~C~TCC~GGC~~~~C~CTT~TGTGTTCTGGC~G~TTTGT~CTGTC~~CC~GGG~CCTGCCC~CC~~TTC~~GGTCT~~G~CC ESDEDUUPDRFDNI QAKHLCULADLYCPPGTCPPIQGLRP
4930
4940
4950
4960
4970
4980
4990
5000
5010
GGTAGISGAA~TGTGC~GTTCTCIACCG~CC~T~TC~~GGC~G~GGCTATGTTATCTCCFlGCFlGGFITCTTCGTGGAFICFITAAFITCC~~TT~TTGT~G~CC~TTACTCATGCTCCCTGaCTTCI IKfhEAMLSPfiGSSWN I N P I IUDHYSCSLTY UEKC&ULTDH
5050
5060
5070
5080
5090
5100
5110
5120
5130
TcTcCGGcGFlGGFlTCGATC~~~C~GaT~~G~TTG~G~GTTG~TCC~GG~TTC~TTTTCG~CGCCCTCGCTG~GGT~~aTGTC~GTCaGCCaaaGaTCGGCaGCaaC~aCaTCTC~aTaT IKQ1RLRUDPGFIFDALf4EUNUSQPKIGSNNISNM L R R G 5
5170
5180
5190
5200
5210
5220
5230
5240
5250
GAGCATCFlAGGCTTTCRGACCCCCaCaCG~TGaTGTTGCaaaaTTGCTCa~aGaTaTCaaCaC~aGC~aGCaC~aTCTTCCCaTTTCaGGGGGCaaTCTCGCCaaTT~TGa~aTCCaTGC SIKAFRPPHDDUAKLLKDINTSKHNLPISGGNLANYE
5290
5300
5310
5320
5330
5340
5350
5360
I
5370
H
a
5380
5390
5400
5500
5510
5520
5620
5630
5640
5740
5750
TTTCCGCAGFIFITCGGGTTGAFlCTCFITCTGCTTGCTaCaaaGCTGTTGaGaTaTCaaCaTTa~TTaGG~GaTGCCTTGaGCCaGGGGaGGaCGGCTTGTTCTTGGGTGaGGGaTCGGGTTC FRRIGLNSSACYKAUEISTLIRRCLEPGEDGLFLGEGSGS
5410
5420
5430
5440
5450
5460
5470
5480
5490
TATGTTGFlTCACTTATAAGGFIGFITFICTTaaaCT~aGC~aGTGCTTCTaTa~TaGTGGGGTTTCCGCCaaTTCTaGaTCTGGTCa~aGGG~aTTaGCaCCCTaTCCCTCCGa~TTGGCCT MLITYKEILKLSKCFYNSGUSANSRSGQRELAPYPSEUGL
5530
5540
5550
5560
5570
5580
5590
5600
5610
TGTCGaaCFlCFIGFIATGGGaGTaGGTa~T~TTGTC~a~GTGCTCTTTaaCGGGaGGCCCGa~GTCaCGTGGGT~GGCaGTGTaGaTTGCTTCa~TTTC~T~GTTaGTaaT~TCCCTaCCTC UEHRllGUGNIUKULFNGRPEUTWUGSUDCFNFIUSNIPTS
5650
5660
5670
5680
5690
5700
5710
5720
5730
TAGTGTGGGGTTTATCCATTCaGaTaTaGaGaCCTTGCCTGaCaa~GaTaCTaTaGaGaaGCTaGaGGaaTTGGC~GCC~TCTTaTCGaTGGCTCTGCTCCTGGGCaaa~TaGGaTCaaT SUGFIHSDIETLPDKDTIEKLEEL~~ILSllf3LLLGK
5770
5780
5790
5800
5810
5820
5830
5840
5760 I
5850
5860
5870 I
5980
5970
6020
6030
6040
6050
6060
6070
6080
6130
6140
6150
6160
6170
6180
6190
6200
S
I 6090
ATCCFITTAaGCAaCTaAGCTGCATACAAGC~aTTGTGGGaGaCGCaGTTaGTaGaGGTGaTaTCaaTCCTaCTCTGaaaaaaCTTaCaCCTaTaGaGCaGGTGCTGaTCaaTTGCGGGTT NPTLKKLTP IUGDAUSRGDI SIKQLSCIQA
6100
I
T
6000
5990
TG,,TC:~~~TTGGTT~~~~CaGaTc~~~~GGCTaa~~~~CTaaTG~~~~CTGaaa~~~~TaaGCa~~~~aTaaTTGaaTCaTCTGTGaGGaCTTCaCCTGGaCTTa~G~TC~CaTCCT ESSURTSPGL IKPPII ESYLUMTDLKANRLMNPEK
6010
S 5880
FlCTGGTGFITTFIFIGCTTATGCCTTTc~GCGGGG~TTTTGTTCaGGGaTTTaTaaGTTaTGTaGGGTCTCaTT~TaGaGaaGTGaaCCTTGTaTaCCCTaGaT~CaGCa~CTTCaTaTCT~c ISY’JGSHYREUNLUYPRYSNF LUIKLMPFSGDFUQGF
5960
G
L
6110
6120
6230
6240
6350
6360
IEQULINCGL
6210
6220
GGCFI~TTAACGG~CCTFIAGCTGTGCa~~G~~TTG~TCC~CC~TG~TGTTGCCTCaGGGC~aGaTGGaTTGCTTaaTTCTaTaCTCaTCCTCTaCaGGGaGTTGGCaaGaTTCaa~GaCaa GQDGLLNSILILYRELhRFKDN a' INGPKLCKELIHHDUAS
6250
6260
6270
6280
6290
6300
6310
6320
6330
6340
CCAA~G~~GTCAACAAGGGaTGTTCCACGCTTACCCCGT~TTGGTa~GTaGCaGGC~~CGaGaaCTTaTaTCT~GGaTCaCCCGCaaaTTTTGGGGGC~C~TTCTTCTTTaCTCCGGG~a TRKFWGHI QRSQQGMFHAYPULUSSRQRELISRI
6370
6380
6390
6400
6410
6420
6430
6440
LLYSGN
6450
6460
6470
CaGaaAGTTGaT~AaTA~GTTTATCCAGAATCTCAAGTCCGGCT~TCTGaT~CTaG~CTT~CaCC~GaaTaTCTTCGTT~~GaaTCTaTCCa~GTCaG~G~~aC~GaTTaTTaTGaCGGG IFUKNLSKSEKQI LDLHQN IQNLKSGYLI N K F R K L I
6490
6500
6510
6520
6530
6540
6550
6560
6618
6620
6630
6640
TGCCCTaGGTGGTTaGGCaTTATTTGCaaTaTfiTTaaaG * FIG.
P-Continued.
I
6570
GGGTTTGaaaCGTGAGTGGGTTTTTaaGGTaaCaGTC~~GGaGaCCa~~GaaTGGTaTa~GTTaGTCGGaTaCaGTGCCCTGaTTaaGGaCTaaTTGGTTGaaCTCCGGaaCCCTaaTCCI K D LUGYSAL GLKREWUFKUTUKETKEWYK
6400
6580 *
6590
II
T
G
6600
MEASLES VIRUS L GENE SEQUENCE
493
some regions of the L gene, even though they are substantially blocked. We find no indication of internal transcriptional start signals in these regions, nor have we found smaller transcripts from the L gene region (unpublished results). Moreover, the possible presence of genome (-) sense transcripts in RNAs extracted from MV-infected cells has been ruled out (Udem and Cook, 1984). Thus, barring the future demonstration of small functional proteins coded from internal regions of the MV L mRNA, we conclude that the only significant L gene product is the 247,611 -kDa protein. Predicted primary structure of the MV L protein The predicted amino acid sequence of the MV L protein is shown in Fig. 2 beneath the gene sequence. Its length of 2183 amino acids is in the middle of the range set by the L proteins of VSV (2109), Sendai (2048; Morgan and Rakestraw, 1986; or 2228, Shioda et a/., 1986), and NDV (2204), and its deduced amino acid composition is comparable to that of these viruses. In particular, its Leu + Ile content of 18.6% is more than 1.5 times that of an “average” protein (Dayhoff et a/., 1978). The MV L protein is predicted to be quite basic (pl = 9.45), on a par with that of VSV, and considerably more basic than the predicted L proteins of Sendai virus or NDV. The MV L protein contains 1 1 clusters of basic amino acids with a net charge of 2 +4, sug-
1234
5
FIG. 4. Coding potential of the MV L gene. The positions of nucleotides in the L gene or anti-L-gene are shown besidethe bands at the left and right, representing reading frames : -3 and 4-6, respectively. The center panel shows the coding potential of the six frames (numbered bars), calculated using a “C statistic” derived by the product method from the coding bias table of the L gene with a window of 50; values above M = 1.7 are considered significant.
123
gesting that some of these might represent domains involved in electrostatic interactions with the RNA genome. Uniquely among the sequences compared, the MV L protein displays a small hydrophobic region at its N-terminus. Homologies of MV L gene and protein
FIG. 3. Determination of the L mRNA 3’end by Sl nuclease mapping. Left panel, lane 1: Hinfl digest of clone 30 plasmid DNA, 3 end-labeled and redigested with Pstl. Lanes 2 and 3: labeled DNA annealed with 20 or 100 pg of CsCl pellet RNA from MV-infected HeLa cells and Sl nuclease treated. Lanes 4 and 5: labeled DNA anneaied with 1 rg of MV genomic RNA or 100 pg of CsCl pellet RNA from uninfected HeLa cells and Si nuclease treated. Middle panel: lanes 1,2,3, same as lanes 1,3,5 of left panel. Maxam-Gilbert sequence lanes are lettered. Right panel: expanded view of Maxam-Gilbert sequence showing L gene termination region (presented as minus-sense DNA).
When the predicted MV L protein was compared with that of Sendai virus by means of the IBI/Pustell homology algorithm (Pustell and Kafatos, 1982) a strong diagonal along most of the length of the proteins indicated that these proteins are linearly and very closely related. When MV was next compared with NDV by this technique, using the same matrix parameters, the overall homology was weaker than with Sendai, but still strong over large regions of the L proteins of these paramyxoviruses. However, when MV was compared with VSV, homologies along the diagonai were reduced to three regions of low percentage match (data not shown). These results demonstrate
494
BLUMBERG ET AL. A
220 230 240 250 * I I : KESOHVYlL7FELVLNYCDVlE6RLHlEI~lIDARYT MU Pet Froq lo 472 225 260 KltltqYiL7pELVLNVCDVvESR*nlsaRphlDkk~i SU 362 221 256 tnenkftcL7qELVLNVaDq&tdHvniiritlvhlr
NDU
232 216 255 7fkkldil~dpnfILNvkDVli6R~q7vl~(rcridnl
USU
t 8 F 4 t 8 t PVOVtNVWSSNVLHDPEfNLSVSLREKGl~I6RLFN~IV~~C6VlG~NLtS~~6K Pet Fran IO 671 51s 57s 451 518 57R
SU B
NDU
271 522 565 .
.
dliiqlkqkErqIKIr6RfCslHs*Kfpe7fVItE7Llkthfvp
730 740 750 a0 770 780 I I I k 1 8 MU KVPNDDlFIYYPH66IE6VCDKLU7lSIIP~VLGN~GS~Rl6SLVffiDND7INVIKRWS7 Pet Fran IO 571 724 7Sh
su
ND” 571 702 764 rVPNDdIylvrrr66IE61C~LH7qisI~4iqLNN~rSh~RvN~qVD6DNDvlNVI~~V~Sd 291 665 722 nrtrqpvc~qqqeGSIE6lrDK9Y7IlnllviqreAkirntrvkvlaD6DNDvlctqy 940 :
950 I
960 8
910 t
vso t
1190 1200 1210 1220 I230 I240 1250 12bO I270 I260 I t I I I t t t I 1 VNV6NFFVPS6CO~IOIEISSLAVPYIGSI7DSNlDNKLNFV~PSRSLRS6VRlN7VVSNNV~SSN~NNLL~OR6NVSLE~LRVlIPlSISI MU
“6”
VSU
Pet Froa lo 491 1187 1266 pivtYFylPdnidLOtltnqcpaiRiPlf6SaIOERsrrqLqyVRnlSkpdrAiRIAIVYtYAIGtDqiSH~EN~LiRqtRRNlSLEnLkIltPvSIST
S U
372 1165 lZb4 ~hftYFhlPSnieLtDdtrknpplRVP9lSSklqERrpprLhiSphvkaNlRa~~VliNRV6ODrlnNt~Altillk~R~Ni~LE7LRIlrPlpIa9
NDU
I290 I300 II10 1520 I330 1340 1350 1360 1370 1360 I t I I , 4 I t 1 8 NLRHRUIOR67PVKVS6ISLVRVRR777tSND~SFVlSDKKVOlNFl7OR~L6L6VLE7LFRL~KDl6SSNlVLHLHVEIDCCVIPNIDHPRlPSSR MU Y PC, cm. T” S U 41% I267 lJ6b NL~HRLkDtrl6~KfS~rtLVAarRfiIISNDffi~lkerqerk0lNlvVWil(Lt6LslfEln~RykKpqlqkp~iLHLHl~~q~i~~vpq~~~lPprr
_. -_ _
251 1265 1364 lUqHRLdDqilbtftpaSL7RChttf77~ilkq7~lttqstrqqnfinrvHLLGLsliEsiFp~tttrt7deitLHLH~kfvCCire~pv~vpfeliq
NDU
LPAPIGGNNVLNi,SRLFVAWIGOPVTSSIf@LKRHlLNSLNPEE7LHO MU 1390 I400 Ill0 I420 I430 I440 1450 1460 1470 I460 I 1 I L I I t 8 8 t KLELRRELCTNPLlVDNRPLlOADAlALYrOSHRAHLVEFVlNSl~VNILNKSlNLSHlDLVIKFGKDH6NEiS6LI~lNSF~lEFLLlEPRLFl MU
Pet Fro, lo 521 933 980 iPNnv66fNYtstSRcFVRNl6OPavaal6DLKRfIrNdLldkqvL7r SU Iv.
PII
956 tPRqlG6lmLqySRLptRNlGDPq~tafRriXRleavqLlspni~tn
251 E60 927 Ldpql6Gvrp~slSRfliR4fp0PvTeSlslrr(ihvhdr~ehlteqs
NDU USU
Pet Fran To 342 1387 1486 tLdLeitqenNtLlVDpdPLtDvDl~lfrtvrdvvHtVd~tyNSddevir~t~ic7NqtirDtrrqIdrD~ItEqiRLvnDDDvNSIITEF~vldvpLFr
S U
FIG. 5. (A) Comparison of predicted MV, Sendai virus (Shioda et al., 1986), NDV, and VSV L proteins, showing regions of highest sequence conservation. (B) GDD homologies in paramyxovirus L protein sequences.
that the L proteins are strongly conserved among paramyxoviruses, whereas MV had diverged considerably from VSV. This impression is reinforced by similar comparison of the L gene sequences, where good correlation on the diagonal was observed between MV, Sendai virus, and NDV, whereas correlation between MV and VSV was no stronger than random scatter (data not shown). Sequence homologies between MV, Sendai, NDV, and VSV L proteins are shown in Fig. 5A. Note that although the homologies with VSV are of modest degree, they are colinear with the paramyxovirus protein sequences, suggesting a direct evolutionary relationship between the L genes. Charged and hydrophobic amino acids are strongly conserved, and substitutions appear aimed at maintaining the physical properties of conserved blocks. In the region of strongest VSV homology, from amino acids 730-780 of MV, two blocks of amino acids GGIEGYCOKLWTI and LVQGDNQTI are conserved with precise spacing, suggesting that this extended region is vital for function. Noteworthy features of the first block are the paired glycine residues, echoed at positions 938-939, and the presence of the rare amino acid tryptophan, which may demark a surface region of the protein. MV L protein contains distinct elements of ancestral RNA polymerase In addition to the extensive sequence homology with other paramyxoviruses described above, the MV
L protein also contains regions of distinct homology to conserved sequences shown to be present in a number of RNA viruses and proposed to represent the nucleic acid recognition or active site of an ancestral RNA-dependent RNA polymerase (Kamer and Argos, 1984). The most characteristic feature of this conserved sequence is a central Gly-Asp-Asp (GDD) flanked by strongly hydrophobic regions. In the MV L protein there are two GDD sequences, at positions 3770-3779 and 4412-4420 of the L gene. Interestingly, each is augmented by a third D, and these GDDD sequences (underlined below and in Fig. 2) are flanked by predominately hydrophobic amino acids (the sequences respectively are IATVYSWAYGDDDSSWNEAWLLA and ISALIGDDDINSFITEFLLI). The MV L protein also contains an LDD sequence at positions 3614-3622. A comparison of this region of the paramyxovirus L proteins is shown in Fig. 56. The sequence surrounding the first GDDD of MV is very highly conserved in both Sendai and NDV, but in each case two of the D’s are altered. Interestingly, both alterations involve substitution of Glu for Asp, thus maintaining two negatively charged amino acids at the site of homology. In contrast, the surrounding hydrophobic sequence and the three D’s of the second GDDD of MV are preserved in Sendai virus, but there is no counterpart to this sequence in NDV. On the other hand, NDV and Sendai contain a strong homology to the LDD and
MEASLES VIRUS L GENE SEQUENCE
flanking hydrophobic sequences of MV, except that the D’s are now separated by an intervening amino acid. Note that these hydrophobic flanking sequences include three ttyptophan residues. Another striking feature is the conserved twin cysteine residues at positions 1369-1370 (asterisk) that could constitute an anchor in the L protein secondary structure for the putative active sites. DISCUSSION In this report, we describe the sequence determination of the MV L gene, deduce the primary structure of the MV L protein, and compare it with the deduced L proteins of two other paramyxoviruses, Sendai and NDV, and with that of VSV. The similar sizes and amino acid compositions of the L proteins, with Leu + Ile levels almost twice that of an “average” protein, as well as the strong homologies between the predicted paramyxovirus L proteins, suggest that the deduced MV L protein sequence is probably correct. To date, however, only the VSV L protein sequence has been directly confirmed, by construction and expression of a full-length clone (Meier et al., 1987; Schubert et al., 1985). The MV L protein exhibits one puzzling feature: its predicted N-terminal amino acid sequence is MetAsp. The initiator Met of cytoplasmic proteins is frequently removed in vivo (Wold, 1981) and, according to the “N-end rule” of Bachmair et al. (1986), proteins with N-terminal Asp would be predicted to have a remarkably short half-life of 3 min in the cytoplasm. Although a low cytoplasmic concentration of MV L protein may well be favorable for transcriptional activity (Meier et a/., 1987), there is otherwise no reason to expect that it would be especially unstable. As L protein function continues throughout the MV replication cycle, it is likely that the N-terminus of MV L protein is protected in some manner in vivo. Post-translational processing of the Sendai virus N and M proteins, and of the VSV N protein, has indeed been documented by direct analysis of the blocked N-termini of these cytoplasmic proteins (Rose et a/., 1984; Blumberg et al., 1984a, b). In each case, this processing involved cleavage of the initiator Met and N-acetylation of the following amino acid. As the mechanism proposed for protein destabilization involves preferential ubiquitination of certain N-termini (Bachmair et a/., 1986; Fried et a/., 1987), it is possible that N-terminal blockage, e.g., by acetylation, may confer stability by preventing ubiquitination. It is therefore tempting to propose that the N-terminus of the MV L protein may be processed to AC-ASP in viva;
495
however, Asp is not among those N-terminal amino acids frequently found acetylated (Weld, 1981). Alternatively, the initiator Met may not be removed from the MV L protein, perhaps due to folding at its hydrophobic N-terminus. The MV L gene and protein both exhibit very close homology with their Sendai virus counterparts, far stronger and more extensive than homologies reported between any other two genes or proteins of these paramyxoviruses. We also found considerable homology between MV and NDV, though to a lesser extent than with Sendai, and only very distant homology between MV and VSV L proteins. Interestingly, considerable homology was found between the L proteins of NDV and VSV (Yusoff er al., i 987), while very little homology was detected between Sendai and VSV. Taken in concert, these findings suggest an evolutionary pathway leading from VSV to NDV to Sendar to MV, but more sophisticated statistical analysis of the data will be needed. Interestingly, this pathway is consistent with a narrowing of the host range of these viruses over the course of evolution, and suggests that the evolution in paramyxoviruses of two separate glycoprotein genes from the G gene region of rhabdoviruses may have been part of this process. The deduced MV L protein shows distinct homologies with a highly conserved GDD sequence proposed to represent part of the active site of an ancestral RNA polymerase. However, the MV GDE homologies also differ significantly from those compiled (Kamer and Argos, 1984; Gustafson et al,, 1987) which represent mainly (+) strand RNA viruses and reverse transcriptases. The differences include the loss of the conserved upstream GxxxTxxxN sequence, extension of the GDD sequence to GDDD, duplication of the GDDD region, and divergence to an LDD region. These differences may relate to the profound divergence of replication strategy between the (+) RNA viruses, where the viral polymerase transcribes naked viral RNA, and the (-) RNA viruses, where the nucleocapsid structure is the obligatory template for the viral polymerase. Exact GOD sequences are not present in the L proteins of Sendai virus or NDV, although attention was called to weaker LDD and SDD sequence homologies (Shioda et al., 1986; Yusoff era/., 1987). However, the importance of pursuing these homologies ties in the likelihood that some of these sequences may cornprise part of the nucleotide recognition and/or polymerization site of the L proteins, or perhaps vestiges of such sites in the course of evolutionary change. For example, Fig. 5A shows an RDDD sequence with hydrophobic flanking sequences at positions 525-527 of the NDV L protein that have strongly conserved counterparts in both MV and Sendai virus, with net charge
496
BLUMBERG
of -2 maintained by Asp to Glu mutations. This GDD homology otherwise would not be obvious in the L proteins of MV or Sendai virus. Likewise, the highly conserved QGDNQ block shown in Fig. 5A may represent a GDD homology with altered charge. The active sites in the MV L protein remain undefined. However, despite their large size, the high degree of sequence conservation among paramyxovirus L proteins, due to their multifunctional nature, can be used to advantage in making structural comparisons. Using the more explicit homologies as a guide, as more data become available, it may be possible to decipher the functional rules of the active sites in negative-strand RNA virus polymerases. ACKNOWLEDGMENTS We thank Eytan Young, Barry Schanzer, Leyla Arik, Deborah Schuback, and Dominador Manalo for technical assistance, and Daniel Kolakofsky, Noel Tordo, Oliver Poch, Lydle Bougueleret, and Amiya Banerjee for helpful discusslons. This work was supported by grants from the Veterans Administration, the National Multiple Sclerosis Society, and the UMDNJ Foundation.
REFERENCES ALKHATIB, G., and BRIEDIS, D. J. (1986). The predicted primary structure of the measles virus hemagglutinin. I/iro/ogy 150, 479-490. BACHMAIR, A., FINLEY, D., and VARSHAVSKY,A. (1986). In VIVOhalf-life of a protein is a function of its amino-terminal residue. Science 234, 179-l 86. BELLINI, W. J., ENGLUND, G., RICHARDSON,C. D., ROZENBLATT,S., and i.AZZARINI, R. A. (1986). Matrix genes of measles virus and canine distemper virus: Cloning, nucleotide sequences, and deduced amino acid sequences. 1. Viral. 58, 408-416. BELLINI,W. J., ENGLUND, G., ROZENBLA~, S., ARNHEITER.H., and RICHARDSON, C. D. (1985). Measles virus P gene codes for two proteins. J. Viral. 53, 908-919. BIRD, A. P. (1980). DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8, 1499-l 504. BLUMBERG,B. M., GIORGI, C., KOLAKOFSKY,D.. ROSE, K., and KOCHER, H. (1984a). Preparation and analysis of the nucleocapsid proteins of vesicular stomatitis virus and Sendal virus, and analysis of the Sendai virus leader-NP gene region. 1. Gen. Viral. 65, 769-779. BLUMBERG,B. M., GIORGI, C., ROSE, K.. and KOLAKOFSKY.D. (1985a). Sequence determination of the Sendai virus fusion protein gene. I. Gen. Viral. 66, 317-331. BLUMBERG, B. M., GIORGI, C., Roux, L., RAJU, R., DOWLING, P. C., CHOLLET, A., and KOLAKOFSKY,0. (1985b). Sequence determination of the Sendai virus HN gene and its comparison to the influenza virus glycoproteins. Cell 41, 269-278. BLUMBERG, B. M.. ROSE, K.. SIMONA, L., Roux, L., GIORGI, C., and KOLAKOFSKY,D. (1984b). Analysis of the Sendai virus M gene and protein. J. Viral. 52, 656-663. CATTANEO, R., REBMANN, G., SCHMID, A., BACZKO, K., TER MEULEN. V., and BILLETER, M. A. (1987). Altered transcription of a defective measles virus genome derived from a diseased human brain. EMBO 1. 6, 681-688. CHEN, E. Y., and SEEBURG,P. H. (1985). Supercoil sequencing: A fast and simple method for sequencing plasmid DNA. DNA 4, 165-170.
ET AL CHOPPIN, P. W., and COMPANS, R. W. (1975). Reproduction of paramyxoviruses. In “Comprehensive Virology” (H. Fraenkel-Conrat and R. R. Wagner, Eds), Vol. 4. pp. 95-178. Plenum, New York. CROWLEY,J., DOWLING, P. C., MENONNA. J., SCHANZER, B., YOUNG, E., COOK, S. D., and BLUMBERG, B. M. (1987). Molecular cloning of 99% of measles virus genome, positive identification of 5’ end clones, and mapping of the L gene regions. lntervirology 28, 65-77. CROWLEY.J. C., DOWLING, P. C., MENONNA. J., SILVERMAN,J. I., SCHUBACK, D.. COOK, S. D., and BLUMBERG, B. M. (1988). Sequence variability and function of measles virus 3’ and 5’end and intercistronic regions. I/iro/ogy 164, 498-506. DAYHOFF, M. O., HUNT, L. T., and HURST-CALDERONE, S. (1978). Composition of proteins. ln “Atlas of Protein Sequence and Structure” (M. 0. Dayhoff, Ed.). Vol. 5, Suppl. 3, pp. 363-373. Natl. Biomed. Res. Found., Washington, DC. DOWLING, P. C., BLUMBERG,B. M., MENONNA, J., ADAMUS, 1. E., COOK, P., CROWLEY,J. C., KOLAKOFSKY.D., and COOK, S. D. (1986). Transcriptional map of the measles virus genome. 1. Gen. Viral. 67, 1987-1992. DOWLING, P. C., GIORGI, C., Roux, L., DETHLEFSEN, L. A., GALANTOWICZ, M. E., BLUMBERG,B. M.. and KOLAKOFSKY.D. (1983). Molecular cloning of the 3’.proximal third of Sendai virus genome. Proc. Nat/. Acad. Sci. USA 80, 52 13-5216. FRIED, V. A., SMITH, H. T., HILDEBRANDT, E., and WEINER, K. (1987). Ublquitin has intrinsic proteolytic activity: Implications for cellular regulation. Proc. Nat/. Acad. Sci. USA 84, 3685-3689. GIORGI, C., BLUMBERG, B. M., and KOLAKOFSKY,D. (1983). Sendai virus contains overlapping genes expressed from a single mRNA. Cell 35, 829-836. GRANTHAM, R. (1978). Viral, prokaryote and eukaryote genes contrasted by mRNA sequence index. FfBS Len. 95, l-l 1. GUSTAFSON,G., HUNTER, B., HANAU, R., ARMOUR, S. L., and JACKSON, A. 0. (1987). Nucleotide sequence and genetic organization of barley stripe mosiac virus RNA gamma. virology 158, 394-406. GUSTAFSON, T. L.. LIEVENS, A. W., BRUNELL, P. A., MOELLENBERG, R. G., CHRISTOPHER,B. S., BUTTERY. M. G., and SEHULSTER,L. M. (1987). Measles outbreak in a fully immunized secondary-school population. N. fngl. J. Med. 316, 771-774. HEGGENESS,M. H., SCHEID, A., and CHOPPIN, P. W. (1980). Conformation of the helical nucleocapsids of paramyxoviruses and vesicular stomatitis virus: Reversible coiling and uncoiling induced by changes in salt concentration. Proc. Nat/. Acad. Sci. USA 77, 2631-2635. IVERSON, L. E., and ROSE, J. K. (1981). Localized attenuation and discontinuous synthesis during vesicular stomatitis virus transcription. Cell 23, 477-484. KAMER, G., and ARGOS. P. (1984). Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Res. 12, 7269-7282. KINGSBURY,D. W. (1974). The molecular biology of paramyxoviruses. Med. Microbioi. Immunol. 160, 73-83. KINGSBURY,D. W., BRATT, M. A., CHOPPIN. P. W.. HANSON, R. P., HOSAKA. Y., TER MEULEN, V., NORRBY. E., PLOWRIGHT,W., Roar, R., and WUNNER, W. H. (1978). Paramyxoviridae. lntervirology 10, 137-152. KORNELUK, R. G., QUAN. F., and GRAVEL, R. A. (1985). Rapid and reliable dideoxy sequencing of double-stranded DNA. Gene 40, 317-323. KOZAK. M. (1984). Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. Nucleic Acids Res. 12, 857-872. KODAK, M. (1986). Point mutations define a sequence flanking the
MEASLES VIRUS L GENE SEQUENCE AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283-292. LEPPERI, M., RITTENHOUSE, I., PERRAULT, J., SUMMERS, D. F., and KOLAKOFSKY,D. (1979). Plus- and minus-strand leader RNAs in negative strand virus infected cells. Cell 18, 735-747. MPXAM, A. M., and GILBERT, W. (1980). Sequencing end-labelled DNA with base-specific chemical cleavages. In “Methods in Enzymology” (L. Grossman and K. Moldave, Eds.), Vol. 65, pp. 499-560. Academic Press, New York. MEIER, E., HARMISON, G. G., and SCHUBERT, M. (1987). Homotypic and heterotypic exclusion of vesicular stomatitis virus replication by high levels of recombinant polymerase protein L. Virology 61, 3133-3142. MORGAN, E. M., and RAKESTRAW,K. M. (1986). Sequence of the Sendai virus L gene: Open reading frames upstream of the main coding region suggest that the gene may be polycistronic. Virology 154,31-40. MORGAN, E. M., and RAPP, F. (1977). Measles virus and its associated diseases. Bacferiol. Rev. 41, 636-666. NEUBERT,W. J., and KOCH, E. M. (1985). Molecular clones representing Sendai virus L-gene: Sequence of 1,179 nucleotides from the 3’ end of the L-gene. Zentralbl. Bakteriol. Microbial. Hyg. B 260,
498-499. PUSTELL,J., and KAFATOS, F. C. (1982). A high speed, high capacity homology matrix: Zooming through SV40 and polyoma. Nucleic
Acids Res. 10,4765-4782. PUSTELL,J., and KAFATOS, F. C. (1984). A convenient end adaptable package of computer programs for DNA and protein sequence management, analysis and homology determination. Nucleic
Acids Res. 12, 643-655. RICHARDSON,C., HULL, D., GREER, P., HASEL, K., BERKOVICH,A., ENGLUND, G., BELLINI, W., RIMA, B., and LAZZARINI, R. (1986). The nucleotide sequence of the mRNA encoding the fusion protein of measles virus (Edmonston strain): A comparison of fusion proteins from several different paramyxoviruses. virology 155, 508-523. RIMA, B. K., BACZKO, K., CLARKE, D. K., CURRAN, M. D., MARTIN, S. J., BILLETER, M. A., and TER MEULEN, V. (1986). Characterization of clones for the sixth (L) gene and a transcriptional map for morbilliviruses. 1. Gen. Vifol. 67, 1971-1978. ROSE, K., KOCHER, H. P., BLUMBERG, B. M., and KOLAKOFSKY, D.
497
(1984). An improved procedure, involving mass spectrometry, for N-terminal amino acid sequence determination of proteins which are Na-blocked. Biochem. J. 217, 253-257. ROZENBLATT,S., EIZENBERG,O., BEN-LEVY, R., I.AVIE, V., and BELLINI, W. J. (1985). Sequence homology within the morbilliviruses. I. Vii-o/. 53, 684-690. SCHUBERT, M., HARMISON, G. G., and MEIER, E. (1984). Primary structure of the vesicular stomatitis virus polymerase (L) gene: Evidence for a high frequency of mutations. J. Viral. 51, 505-514. SCHUBERT, M., HARMISON, G. G., RICHARDSON,C. D., and MEIER, E. (1985). Expression of a cDNA encoding a functional 241-kilodalton vesicular stomatitis virus RNA polymerase. Proc. Nati. Acad.
Sci. USA 82,7984-7988. SHIODA, T., IWASAKI,K., and SHIBUTA, l-l. (1986). Determination of the complete nucleotide sequence of the sendai virus genome RNA and the predicted amino acid sequences of the F, HN and L proteins. Nucleic Acids Res. 4, 7545-l 563. UDEM, S. A. (1984). Measles virus: Conditions for the propagation and purification of infectious virus in high yield. J. Viral. Methods 8, 123-136. UDEM, S. E., and COOK, K. A. (1984). Isolation and characterization of measles virus intracellular nucleocapsid RNA. /. viral. 49, 57-65. WALLACE, R. B., JOHNSON, M. J., SUGGS, S. V., M~MSHI, K., BHA~, R., and ITAKURA, K. (1981). A set of synthetic oligodeoxyribonucleotide primers for DNA sequencing in the plasmid vector pBR322.
Gene 16, 21-26. WILD, T. F. (1985). Measles vaccination--Is a new strategy needed in Third World countries? Vaccine 3, 282. WOLD, F. (1981). ln viva chemical modification of proteins (posttranslational modification). Annu. Rev. B&hem. 50, 783-814. YOSHIKAWA,Y., MIZUMOTO, K., and YAMANOLICI-II,K. (1986). Characterization of messenger RNAs of measles virus. 1. Gen. Viral, 67, 2807-2812. YUSOFF, K., MILLAR, N. S., CHAMBERS, P., and EIVIMERSON, P. T. (1987). Nucleotide sequence analysis of the L gene of Newcast!e disease virus: Homologies with sendai and vesicular stomatitis viruses. Nucleic Acids Res. 15, 3961-3976,