Measles virus L protein evidences elements of ancestral RNA polymerase

Measles virus L protein evidences elements of ancestral RNA polymerase

VIROLOGY 164, 487-497 Measles (1988) Virus L Protein Evidences Elements of Ancestral RNA Polymerase BENJAMIN M. BLUMBERG,’ JOAN C. CROWLEY, J...

2MB Sizes 16 Downloads 73 Views

VIROLOGY

164, 487-497

Measles

(1988)

Virus L Protein

Evidences

Elements

of Ancestral

RNA Polymerase

BENJAMIN M. BLUMBERG,’ JOAN C. CROWLEY, JOEL I. SILVERMAN, JOSEPH MENONNA, STUART D. COOK, AND PETER C. DOWLING Neurology Service, Neurovirology Laboratory, East Orange VA Medical Center, East Orange.New Jersey 07019 and De.oartment of Neurosciences MBS H506, UMDNJ-New Jersey Medical School, 185 South Orange Ave., Newark. New Jersey 07103 Received November

5, 1987; accepted

December

30, 1987

We have determined the nucleotide sequence of the measles virus (MV) L gene using a cDNA library encompassing the entire MV genome (J. Crowley et a/. (1987) lnrervirology, 28, 65-77). The L gene is 6639 nucleotides in length, and contains a single long open reading frame that could code for a protein of 247,611 kDa. Both the L gene and in particular the predicted L protein of MV bear substantial homology to their counterparts in Sendai virus and Newcastle disease virus, suggesting that the multifunctional nature of paramyxovirus L proteins imposes strong evolutionary constraints. The predicted MV L protein also contains distinct elements of a postulated ancestral RNA polymerasa. 8 1999 Academic

Press,

Inc.

INTRODUCTION Measles virus (MV), a member of the morbilliform subgroup of paramyxoviruses (Kingsbury et al., 1978), is an important human pathogen (Morgan and Rapp, 1977). Most unvaccinated children experience MV infection, with can entail much morbidity and high mortality (Wild, 1986), and epidemic outbreaks occur rarely even in fully vaccinated populations (Gustafson et a/., 1987). In order to provide a rationale for disease pathogenesis and future vaccine development, a thorough understanding of MV at the molecular level is required. MV contains over 15,000 nucleotides of genetic information in a single RNA of antimessage (-) polarity, tightly bound in a helical nucleocapsid structure (Choppin and Compans, 1975). The nucleocapsid is the obligatory template for viral transcription and replication; unencapsidated viral genomic RNA is transcriptionally inactive and not infectious. Three structural proteins associated with the viral nucleocapsid embody a nucleocapsid-specific RNA polymerase activity containing all enzymatic activities required for viral mRNA synthesis, i.e., capping, methylation, and polyadenylation. The large protein (L, >200 kDa) and the phosphoprotein (P, -70 kDa) are associated with the viral nucleocapsid, and function in catalytic amounts as the viral RNA polymerase. The nucleocapsid protein (N, -60 kDa) is required in stoichiometric amounts for encapsidation, and confers on the nu-

To whom requests for reprints should be addressed at UMDNJ-New jersey Medical School, 185 South Orange Ave., Newark, NJ 07103.

cleocapsid its flexible helical conformation (Heggeness et a/., 198 1). N also confers on the genomic RNA its template activity; in this respect N is functionally a part of the viral polymerase. The MV gene order is 3’-N,P/C,M,F,H,L-5’ (Dowling er al., 1986; Rima et a/., 1986; Yoshikawa et a/., 1986). The viral RNA polymerase is thought to engage the genome at a unique 3’ end promoter, and in its passage toward the 5’ end undergoes attenuation at each gene boundary, thus giving rise to a polarity gradtent of transcription, with the N mRNA being the most abundant (Kingsbury, 1974; lverson and Rose, 1981; Cattaneo et al., 1987). Based on these considerations, and on their simtlar genome organization, RNA synthesis in all paramyxoviruses is thought to be a complex process requiring specific interactions between at least the L, P, and N proteins and sequences in the genome RNA. Because of its large size, the majority of the activities involved are suspected to reside in functional domains of the L proteins, analogously to the L protein of vesicular stomatitis virus (VSV) (Schubert et al., 1984, 1985). The large size of the MV L mRNA, and its relative scarcity due to the polarity gradient, have hindered full-length cDNA cloning of this transcript. Here we report the determination of the nuclcotide seauence of the MV L gene from a genomic library containing overlapping clones encompassing the entire MV genome (Crowley et al., 1987). In a separate communication, we also report the sequence ot the genome 5’ end, thus completing the sequence determination of the MV genome. Comparison of the predicted primary structure of the MV L protein with those of Sendai virus (Shioda er al., 1986; Morgan and Rakestraw, 1986) and NDV (Yusoff ef al., 1987) shows a very high degree of ho-

488

BLUMBERG ET AL.

mology among these paramyxoviruses. In addition, “GDD” sequences, proposed to be characteristic of RNA-dependent RNA polymerases (Kamer and Argos, 1984), are present in the predicted MV L protein, suggesting that paramyxoviruses also may have evolved from a putative common ancestor.

MV nucleocapsid RNA was reverse-transcribed by random priming and inserted into the Pstl site of pBR322 by G/C tailing, as previously described (Dowling et a/., 1983; Crowley et al., 1987). Out of 6000 clones obtained, 608 containing the longest MV-specific inserts were selected by colony hybridization and organized by reference to a previously characterized MV library (Dowling et al,, 1986). Clones containing L gene sequences were identified by Northern blotting, and their exact positions relative to the MV genome were determined by a combination of colony hybridization, restriction endonuclease and Sl nuclease mapping, and primer extension experiments (Crowley et a/., 1987). Fifteen of these clones contained inserts of >lOOO bp, which greatly aided their mapping. Out of the 608 selected clones, over 200 clones spanning the L gene region were mapped, in proportion with the size of the L gene relative to the MV genome.

Because of the presence of G/C tails in our clones, we chose as our primary tool the relatively new technique of supercoil plasmid sequencing by the dideoxy chain-termination method (Chen and Seeburg, 1985; Korneluk et al., 1985). In this technique, the plasmid DNA is alkali-denatured and then neutralized, leaving extensive looped out single-strand regions. The use of commerical synthetic oligonucleotide primers specific for the Estl site of pBR322 (Wallace et al., 1981) and AMV reverse transcriptase (Life Sciences) enabled us to sequence through the G/C tails and to obtain sequences of >300 nucleotides from both ends of an insert. We followed a simple protocol (Pharmacia Technical Bulletin) that is rapid and well-suited for sequencing clones with insert sizes of 400-600 bp. When applied to clones with longer inserts, however, this technique often left gaps in the sequence near the middle of the insert. We found that this problem was best overcome by using the remains of dideoxy reaction mixtures as probes for colony hybridization, to locate among our series of overlapping clones candidates with smaller inserts, or inserts with ends near the gap. Overlapping clones identified in this manner also proved helpful in resolving compression and other ambiguities due to secondary structure inherent in a particular insert. For complete sequencing of clones with large inserts, and to clarify any remaining ambiguities, we supplemented this procedure with Maxam-Gilbert (1980) chemical sequencing from appropriate internal restriction and end-labeling sites. Some of these sites were also used in conjunction with Sl nuclease mapping for determining the boundaries of the MV L gene (see Fig. 2 and also Crowley et a/., 1987). Figure 1 presents a map of the MV L gene, showing the clones and restriction sites used for sequencing and Sl mapping.

Sequencing of MV L gene clones

Sl nuclease mapping

Our sequencing strategy was dictated by the possession of an organized library of overlapping L clones, and all sequence analysis was performed on plasmid DNA. Every nucleotide reported was sequenced in at least two clones, and more than 90% of the sequence was determined in four or more clones. Altogether, the sequence reported is based on the determination of over 40,000 nucleotides. Some sequence variation between isomorphous clones was expected (Schubert et al., 1984). In cases where corresponding sequences did not coincide exactly, or where there was sequence ambiguity, we exploited the depth of our library by sequencing additional clones until a concensus sequence became apparent.

The position of the MV L mRNA 3’ end was determined as in previously published Sl nuclease mapping studies of MV intercistronic regions including the H-L boundary (Crowley et al., 1987), based on techniques developed for mapping Sendai virus genes (Giorgi et al,, 1983; Blumberg et a/., 1984b; 1985a, b). Samples were electrophoresed on 8% sequencing gels.

MATERIALS AND METHODS Preparation of virus and extraction of MV RNAs Our laboratory standard is the Udem strain of Edmonston vaccine MV adapted to HeLa cells in spinner culture (Udem and Cook, 1984; Udem, 1984). RNAs are extracted by previously reported methods (Leppert et a/., 1979; Dowling et al., 1983; Crowley et a/,, 1987).

Molecular cloning of MV genomic RNA and characterization of L gene clones

Computer analysis of the MV L gene and protein and their homologies The Pustell DNA and protein sequence analysis programs were purchased from IBI and used on an IBM PC/AT computer.

MEASLES VIRUS L GENE SEQUENCE

RESULTS MVLgenesequence The nucleotide sequence of the MV L gene is presented in Fig. 2 as mRNA (+) sense DNA. The L gene begins with AGGGTCCAAGTGG, in accord with the H-L intercistronic sequence reported by Cattaneo et a/. (1987) and our Sl mapping data (Crowley et a/., 1987). The first AUG occurs only 23 nucleotides from the start of the L mRNA, and is in a favored context (CCGxx@G) to serve as a eukaryotic initiator codon, according to Kozak (1984, 1986). This AUG begins a long open reading frame (ORF) that terminates with a UAA codon at position 6572, backed up at position 6614 by a second in-frame UAG termination codon. This long ORF contains 2183 codons, and could code for a protein of 247,61 1 kDa. In a separate communication, we report the determination of the final intercistronic and 5’ end region of the MV genome. We show below that the L gene terminates with the sequence AATATATTAAAGAAAA, a variant of the canonical MV termination signal (Cattaneo et a/., 1987). Excluding the final four A’s, which form part of the poly(A) tail of the L mRNA, the MV L gene is 6639 nucleotides long and contains only 90 untranslated nucleotides. Sl nuclease mapping of the L mRNA 3’ end The DNA sequence of the insert of clone 30 appeared to contain the canonical MV termination se-

489

quence, as well as a unique I%nfl site near one end of the insert, at position 6444 of the L gene. We therefore used 3’ end-labeled Hinfl restriction fragments of this clone to map the 3’end of the L mRNA (Fig. 3). In these experiments, digestion of some of the labeled material with Pstl gives a band on denaturing gel electrophoresis that defines the insert length; thus, a band shorter than the distance from the labeling site to the G/C tail represents the end of an mRNA. in the left panel, a band at about 190 bp (arrow) appeared when the 3’labeled DNA was annealed with CsCl pellet RNA from MV-infected cells and Sl nuclease treated (lanes 2 and 3). This band was strongest when a large quantity of RNA was used (lane 3) and was absent when the labeled digest was annealed with mock-infected RNA or with MV genomic RNA (lanes 4 and 5). Note that the Hinfl digest pattern is visible in some lanes due to overloading. The Pstl digest of the labeled DNA (lane 1) also revealed the 240-bp insert band (dot) just below the 246-bp vector band. This band was cut out of the gel for sequencing. The middle panel of Fig. 3 shows a close-up of the Sl experiment, along with the Maxam-Gilbert sequence of the 240-bp Hinfl band. The T+C reaction unfortunately worked poorly; however, the pattern is distinctive, and the termination sequence can be clearly read from the autoradiogram of the different sequencing gel (right panel). In the closeup view, the Sl bands (arrows) are spread throughout the TTCTlTT termination sequence, most likely due to nibbling by the nuclease. Coding potential of the L gene

2.0

1.0 I

l9.2101 I

3.ib

H

L

5.0

4.0

6.0

6.639

ORF

d’

I tkdlll

sslll Ed HndlY

Xmal %dXW

190

36

& 3&

148

83 483

Br/U

M24

552 25 70 356 & J&g& 536

s1+ 104

297 171

292-

Awl

Boll

Hw

%I I

s

424

XM XbalN.4

Ad

-&c&s

,Jg&

Bj1

191 - 406 -549 Me0

SiL NC3792 276 -ii%% 119

100

FIG. 1. Map of the MV L gene. Top to bottom: Scale in kilobases; open reading frame; restriction sites used for Maxam-Gilbert sequencing and Sl nuclease mapping; results of Sl nuclease experiment to determine the L mRNA 3’ end (see Fig. 3); clones used for sequencing: bars indicate the MV gene region represented by the inserts. Based on previously published data for the MV gene and intercistronic sequences (Rozenblatt et a/., 1985; Bellini et a/., 1985, 1986; Alkhatib and Briedis, 1986; Richardson et a/., 1986; Cattaneo el al., 1987) and on our own sequence data (Crowley et a/., 1988), the L gene starts at position 92 10 of the genome.

The codon usage of the MV L gene is similar to that of VSV (Schubert et al., 1984) Sendai (Shioda et al., 1986; Morgan and Rakestraw, 1986), and NDV (Yusoff et al., 1987), and shows the expected bias against CpG dinucleotides characteristic of many mammalian mRNAs (Grantham, 1978; Bird, 1960). The nucleotide composition (29.4% A, 21.3% C, 22.7% G, 26.6%T) is also similar to that of VSV, Sendai, and NDV. Figure 4 demonstrates the coding regions of the L gene according to the IBl/Pusteli program (Pustell and Kafatos, 1984) with the termination codons in the six possible frames shown as T’s in ,the narrow left and right panels. Note that only frame 2 (second from the left) is open full-length. The center panel shows the coding potential of the six L gene reading frames. It is interesting to note that, according to this algorithm which was intended to distinguish exons from introns in eukaryotic DNA, frame 2 which contains the putative L ORF is not always strongiy favored over all other potential read,ing frames. Frames 3 and 4 particularly are also predicted to have high coding potential in

BLUMBERG

490

ET AL.

A

10 40 20 30 50 60 70 80 90 AGGGTCCAAGTGGTTCCCCGTTATGGACTCGCTATCTGTCAACCAGATCTTATACCCTGAAGTTCACCTAGATAGCCCGATAGTTACCAATAA~ATAGTAGCCATCCTGGAGTATGCTCG PlDSLSUNQI LYPE’JHLDSPIUTNKI 130

140

150

160

170

180

190

200

100

110

120

230

240

340

350

360

460

470

UAILEYAR

210

220

AGTTCCTCACGCTTACAGCCTGGAGGACCCTACACTGTGTCAGAACATCAAGCACCGCCTAAAAAACGGATTTTCCAACCAAATGATTATAAACAATGTGG~GTTGGGAATGTCATCAA UPHAYSLEDPTLCQNIKHRLKNGFSNQMII NNUE’JGNUIK

220

260

270

280

290

300

310

320

330

GTCCAAGCTTAGGAGTTATCCGGCCCACT~TCATATTCCATATCCAAATTGTAATCAGGATTTATTTAACATAGAAGACAAAGAGTCAACGAGGAAGATCCGTGAACTCCTCAAAAAGGG SKL’RSYPAHSHIPYPNCNQDLFNIEDKESTRKIRELLKKG

370

380

390

400

410

420

430

440

450

GAATTCGCTGTACTCCAAAGTCA~TGAlAAGGTTTTCCAATGCTTAAGGGACACTAACTCACGGCTTGG~CTAGGClCCGAATTGAGGGAGG.AcATCAAGGAGAAAGTTATTAACTTGGG NSLYSKUSDKUFQCLRDTNSRLGLG.SELREDIKEKU

490

500

$10

520

530

540

550

960

480 I

570

N

L

G

580

590

600

700

710

720

820

830

840

950

960

AGTTTACATGCACAtCTCCCAGTGGTTTGAGCCCTTTCTGTTTTGGTTTACAGTCAAGAcTGA6ATGAGGTCAGTGATTAAATCAcAAACCCATACTTGCCATAGGAGGAGACACACACc UYMHSSQWFEPFLFWFTUKTEMRSUI KSQTHTCHRRRHTP

610

620

630

640

650

660

670

660

690

TGTATTCTTCACTGGTAGTTCAGTTGAGTTGCTAATCTCTC~TGACCTTGTTGCTATAATCAGTAAAGAGTCTCAACATGTATATTACCTGACATTTGAACTGGTTTTGATGtATTGTGA UFFTGSSUELL ISRDLUAI ISKESQHUYYLTFELULMYCD

730

7'40

750

760

770

780

790

800

810

TGTCATAGAGGGGAGGTTAATGACAGAGACCGCTATGACTATTGATGCTAGGTATACAGAGCTTCTAGGAAGAGTCAGATACATGTGGAAACTGATAGATGGTTTCTTCCCTGCACTCGG U IEGRLMTETAMTI IDGFFPALG DARYTELLGRURYMWKL

850

860

870

880

890

900

910

920

930

940

GAATCCAACTTATCAAATTGTAGCCATGCTGGAGCCTCTTTCACTTGCTTACCTGCAGCTGAGGGATATAACAGTAGAACTCAGAGGTGCTTTCCTTAACCACTGCTTTACTGAAATACA ITUELRGAFLNHCFTEIH N P T Y 0 IUAMLEPLSLAYLQLRD

970 980 990 1000 1010 1020 1030 1040 1050 1060 TGATGTTCTTGACCAAAACGGGTTTTCTGATGAAGGTACTTATCATGAGTTAATTGAAGCTCTAGATTACATTTTCATAACTGATGACATACATCTGACAGGGGAGATTTTCTCATTTTT DULDQNGFSDEGTYHELIEALDYIFI IHLTGEIFSFF T D D 1090

1100

1110

1120

1130

1140

1150

1160

1170

1070

1080

ii80

1190

1200

1300

1310

1320

1540

1550

1560

1660

1670

1680

1780

1790

leoo

1900

1910

1920

2030

2040

2150

2160

CAGAAGTTTCGGCCACCCCAGACTTGAAGCAGTAACGGCTGCTGAAAATGTTAGGAAATACATGAATCAGCCTAAAGTCATTGTGTATGAGACTCTGATGAAAGGTCATGCCATATTTTG IUYETLIIKGHAIFC RSFGHPRLEAUTAAENURKYHNQPKU

1210

1220

1230

1240

1250

1260

1270

1280

1290

TGGAATCATAATCAACGG~TAT~GTGACAGG~ACGGAGGCAGTTGGCCACCGCTGACCCTCCCCCTGCATGCTGCAGACACAATCCGGAATGCTCAAGCTT~AGGTGAAGGGTTAA~ACA RNAQASGEGLTH INGYRORHGGSWPPLTLPLHAADTI G I I

AGUKFGCFMPLSLDSDLTMYLKDKALAALQ

EQCUDNWKSF 1450

1460

1470

1480

1490

1500

1510

1520

1530

AAGGGAATGGGATTCAGTTTACCCGAAAG~TTCCTGCGTTACGACCCTCCCAAGGGAACCGGGTCACGGAGGCTTGTAGATGTTTTCCTTAATGATTCGAGCTTTGACCCATATGATGT REWDSUYPKEFLRYDPPKGTGSRRLUDUFLNDSSFDPYDU 1570

1580

1590

1600

1610

1620

1630

1640

1650

GATAATGTATGTTGTAAGTGGA~CTTACCTCCATGACCCTGAGTTCAACCTGTCTTACAGCCTGCAAGAAAAGGAGATCAAGGAAACAGGTAGACTTTTTGCTAAAATGACTTACAAAAT IKETGRLFAKMTYKM I N Y V II S G A Y LHDPEFNLSYSLQEKE 1690

1700

1710

1720

1730

1740

1750

1760

1770

GAGGGCATG~CAAGTGATTG~TGAAAATCTAATCTCAAACGGGATTGGCAAATATTTTAAGGACAATGGGATGGCCAAGGATGAGCAAGATTTGA~TAAGGCA~TC~A~A~TCTAGCTGT GKYFKDNGMAKDEQDLTKALHTLAU IAENLISNGI R A C Q U leio

1820

1630

1040

1850

1860

1870

lee0

1890

CTCAGGAGT~~~CAAAGAT~TCAAAGAAAGT~A~AGGGG~~GG~~AGT~TTAAAAA~~TA~T~C~GAA~~~~A~T~~ACACAAGTA~CAGGAACGT~AGAG~AGCAAAAGG~TTTATAGG SGUPKDLKESHRGGPULKTYSRSPUHTSTRNURAAKGFIG

1930 GTTCCCTCAAGTAATTC F P Q ‘J I

2050

1940

1970

1950

i9eo

1990

2000

2010

GcAGGACCAAGACACTGA~~~~cCGGAGAATATGGAAGCTTAcGAGACAGTCAGTGcATTTATCAcGACTGATcTcAA~~~~TAcTGCCTTAATTGGAGATA TTDLKKYCLNWRY F 61 D Q D T D H P E N M F. A Y E T U S A F 1

2060

2070

2080

2090

2100

2110

2120

21so

2140

TGAGACCATCAGCTTGTTTGCACAGAGGCTAAATGAGATTTACGGATTGCCCTCATTTTTCCAGTGGCTGCATAAGAGGCTTGAGACCTCTGTCCTGTATGTAAGTGACCCTCATTGCC~ SFFQWLHKRLETSULYUSDPHCP Y G L P ISLFAQRLNEI E T

2170

2180

2190

2200

2210

2220

2240

2280

CCCcGAcCTTGAcGcCCATATcCCGTTATATAAAGTcCcCAATGATCAAATcTTCATTAAGTACcc~~~~GGAGGTATAGAAGGGT~~~~TCAGAA~~~~TGGACC~~~~GCACCATTcc IPLYKUPNDQlFlKYPMGGIEGYCQKLWT!ST~F PDLDAH

FIG. 2. Sequence of the MV L gene and deduced L protein primary structure. The L gene sequence is presented as (+) DNA; the L protein is presented in standard single-letter amino acid code. GDDD and LDD homologies to postulated ancestral RNA polymerase (see text) are underlined. Asterisks denote double termination codons.

MEASLES VIRUS L GENE SEQUENCE

5

2290

2300

2310

2320

2330

2340

2350

2360

491

2370

2380

2390

2400

2500

2510

2520

2630

2640

CTATCTATACCTGGCTGCTTATGPGAGCGGAGTAAGGATTGCTTCGTTAGTGCAAGGGGACAATCAGACCATAGCCGTAACAAAAAGGGTACCCAG~ACATGG~~CTACA~CTTAAGAA IAUTKRUPSTWPYNLKK Y L. Y L A A Y E S G U R I A S L u Q G D N P T

2410

2420

2430

2440

2450

2460

2470

2400

2490

ACGGGAAGCTGCTAGAGTAACTAGA~ATTACTTTGTAATTCTTAGGCAAAGGCTACATGATATTGGCCATCACCTCAAGGCAAATGAGACAATTGTTTCATCACFITTTTTTTGTCTATTC USSHFFUYS R E A A R U T R 0 Y F U I L R Cl R L H D I G H H L K A N E T I

2530

2540

2550

2560

2570

2580

2590

2600

2610

262 B

AAAAGGAATATATTA~ATGGGCTACTTGTGTCCCAATCACTCAAGAGCATCGCAAGATGTGTATTCTGGTCAGAGACTATAGTTGATGAAACAAGGGC IARCUFWSETXUDETKAACSNIAT YYDGLLUSQSLKS K G I

2650

2660

2670

2680

2690

2700

2710

2720

GCATGCAGTAATATTGCTAC

2730

2740

2750

AACAATGGCTAAAAGCATCGAGAGAGGTTATGACCGTTACCTTGCATATTCCCTGAACTTCCTAAAAGTGATCCAGCAAATTCTGATCTCTCTTGGCTTCACAATCAATTCAACCATGAC T tl A K S IERGYDRYLAYSLNFLKU I Q 6 ILISLGFT

2770

2780

2790

2800

2810

2820

2830

2840

2850

2760

INSTfiT

2860

2870

2880

2980

2990

3000

3100

3110

3120

3230

3240

CCGGGATGTAGTCATACCCCTCCTCACAAACAACGACCTCTTAATAAGGATGGCACTGTTGCCCGCTCCTATTGGGGGGATGAATTATCTGAATATGAGCAGGCTGTTTGTCAI;AAACAT R D U ‘J I PLLTNNDLLIRMALLPAPIGGilNYLN~SRLFURNI

2890

2900

2910

2920

2930

2940

2950

2960

2970

CGGTGATCCAGTAACATCATCAATTGCTGATCTCAAGAGAATGA?TCTCGCCTCACTAATGCCTGAAGAGACCCTCCATCAGGTAATGACACAAC4A~CG~GG~ACTCTTCATTCCTAGA GDPUTSSIADLKR~IILASL~PEETLHQV~ITQQPGDS.SFLD

3010

3020

3030

3040

3050

3060

3070

3080

3090

CTGGGCTAGCGACCCTTACTCAGCAAATCTTGTATGTGTCCAGAGCATCACTAGACTCCTCAAGAACATAACTGCAAGGTTTGTCCTGATCCATAGTCCAAACCCAATGTTAAAAGGATT TARFUL IHSPNPMLKGL WASOPYSANLUCUQS ITRLLKNI

3130

3140

3150

3160

3170

3180

3190

3200

3210

3220

ATTCCATGATGACAGTAAAGAAGAGGACGAGGGACTGGCGGCATTCCTCATGGACAGGCATATTATAGTACCTAGGGCAGCTCATGAAATCCTGGATCATAGTGTCACAGGGGCAAGAGA FHDDSKEEDEGLAAFLMDRNI 1 L 0 IUPRAAHE

32513

3260

3270

3280

3290

3300

3310

3320

3330

H

s

UTGARE

3340

3350

3360

3460

3470

3480

3590

35911

3600

GTCTATTGCAGGCATGCTGGATACCACAAAAGGCTTGATTCGAGCCAGCATGAGGAAGGGGGGTTTAACCTCTCGAGTGATAACCAGATTQTCCAATTATGACTATG~CAATTCAGAGC SIAGflLDTTKGLIRAStlRKGGLTSRUITRLSNYDYEC!FRA

3370

3380

3390

3400

3410

3420

3430

3440

3450

AGGGATGGTGCTATTGACAGGAAGAAAGAGAAATGTCCTCATTGACAAAGAGTCATGTTCAGTGCAGCTGGCGAGFIGCTCTAAGAAGCCATATGTGGGCGAGGCTAGCT~GA~G~GGCC GMULLTGRKRNULIDKESCSUQLARALRSHWWARLARGRP

3490

3530

3500

3540

3550

3560

3570

TATTTACGGCCTTGAGGTCCCTGA~G~~~~AGAATC~~~~CGAGGCCACCTTATTCGGCGTCATGAGACATGTGTCATCTGCGAGTGTGGATC~GT~AACT~CGGATGGTTTTTTGTCCC C E C G S W IYGLEUPDULES~RGHLIRRHETCUI 3610 3620 3630 3640 3650 3660 3670 3680 3690 CTCGGGTTGCCAACTGGATGATATTGACAAGGAAACATCATCCTTGAGAGTCCCATATATTGGTTCTACCACTGATGAGAGAACAGACATGAAGCTTGCCTT~GT~GffiCCCCAAGTCG SGCPLDDIDKETSSLRUPYI GSTTDERTDMKLAFURAPSR

3730

3740

3750

3760

3770

3780

3790

3800

3810

N

‘I

G

W

F

F

U

P

3700

3710

3720

3820

3830

3840

3940

3950

3960

4060

4070

4080

4180

4190

4200

4310

4320

ATCCTTGCGATCTGCTGTTAGAATAGCAACAGTGTACTCATGGGCTTACGGTGATGATGATAGCTCTTGGAACGAAGCCTGGTTGTTGGCTAGGCAAAGGGCCAATGT~GCCTGGAGGA ATUYSWAYGDDDSSWNEAWLLARQRANUSLEE S L R S A U R I

3850

3860

3870

3880

3890

3900

3910

3920

3930

GCTAAGGGTGATCACTCCCATCTCAACTTCGACTAATTTAGCGCATAGGTTGAGGGATCGTAGCACTCAAGTGAAATACTCAGGTACATCCCTT~TCCGAGT~GCGAGGTATACCACAAT LRUITPI STSTNLAHRLRDRSTQUKYSGTSLURUARYTTI

3970

3980

3990

4000

4010

4020

4030

4040

4050

CTCCAACGACAATCTCTCATTTGTCATATCAGATAAGAAGGTTGATACTAACTTTATATAC~AACAAGGAATGCTTCTAGGGTTGGGTGTTTTAG~AAC~TTGTTTCGA~~TC~AGAA~A SNDNLSFU ISDKKUDTNFIYQRGflLLGLGWLE?LFRLEKD

4090

4100

4110

4120

4130

4140

4150

4160

4170

TACCGGATCATCTAACACGGTATTACATCTTCACGTCGAAACAGATTGTTGCGTGATCCCGATGATAGATCATCCCAGGATAC~CAGCTCCCGCAAGCTA~AGCTGAGGGCAGAGCTATG TGSSNTULHLHUETDCCU IPNIDHPRI PSSRKLELRAELC

4210

4220

4230

4240

4250

4260

4270

4280

4290

4300

TAC~AAC~CATTGATATATGATAATGCA~~TTTAATTGACAGAGATG~AACAAGGCTATACA~C~AGAG~CATAGGAGGCACCTTGTGGAATTTGTTACATG~TCCACA~CCCAA~TATA T N P L IYDNAPLIDRDATRLYTQSHRRHLUEFUTW

4330

4340

4350

4360

4370

4380

4390

4400

4410

STPQLY

4420

4430

4440

4550

4560

TCACATTTTAGCTAAGTCCACAGCACTATCTATGATTGACCTGGTAACAAAATTTGAGAAGGACCATATGAATGAAATTTCAGCTCTCATAGGGGATGA~G~TA~CAATAGTTTCATAAC H I L A K S T A L s n IDLUTKFEKOHMNEI s A L IGDDDINSFIT

4450

4460

4470

4480

4490

4500

4510

4520

4530

TGAGTTTCTGCTCATAGAGCCAAGATTATTCACTATCTACTTGGG~CAGTGTGCGG~CATCAATTGGGCATTTGATGTACATTAT~ATAGACCATCAGGGAA~T~TCAGATGGGTGAG~T E F L L IEPRLFT fYLGQCAAINWAFDUHYHRPSGKYQMGi?L

FIG. 2-Continued.

4540

BLUMBERG

C

4570

4500

4590

4600

4610

4620

ET AL.

4630

4640

4650

4660

GTTGTCATCGTTCCTTTCTAGAATGAGCAAAGGAGTGTGTTT~~GGTGCTTGTC~~TGCTCT~~GCC~CCC~~~G~TCT~C~~G~~~TTCTGGC~TTGTGGT~TT~T~G~GCCT~TCC~TGG LSSFLSRNSKGUFKULUN~~LSHPK IYKKFWHCGI

4690

4700

4710

4720

4730

4740

4750

4760

4770

4670

4680

IEPIHG

4780

4790

4800

4900

4910

4920

5020

5030

5040

5140

5150

5160

5260

5270

5280

TCCTT~A~TT~~TG~TC~FI~~~TTG~~C~~~~CTGTGTGC~~C~T~GTTT~~~~~TGCT~T~TG~CCT~~CTCG~CCTGTTGTTG~~T~~~~~~TT~~~~~~GTTC~C~TTT~TCTTGTG PSLDAQNLHTTUCNMUYTCYIITYLDLLLNEELEEFTFLLC

4810

4820

4830

4B40

4850

4860

4870

4080

4990

TGA~CIGCGACG~GG~TGTAGTFlCCGG~C~G~TTCG~C~~C~TCC~GGC~~~~C~CTT~TGTGTTCTGGC~G~TTTGT~CTGTC~~CC~GGG~CCTGCCC~CC~~TTC~~GGTCT~~G~CC ESDEDUUPDRFDNI QAKHLCULADLYCPPGTCPPIQGLRP

4930

4940

4950

4960

4970

4980

4990

5000

5010

GGTAGISGAA~TGTGC~GTTCTCIACCG~CC~T~TC~~GGC~G~GGCTATGTTATCTCCFlGCFlGGFITCTTCGTGGAFICFITAAFITCC~~TT~TTGT~G~CC~TTACTCATGCTCCCTGaCTTCI IKfhEAMLSPfiGSSWN I N P I IUDHYSCSLTY UEKC&ULTDH

5050

5060

5070

5080

5090

5100

5110

5120

5130

TcTcCGGcGFlGGFlTCGATC~~~C~GaT~~G~TTG~G~GTTG~TCC~GG~TTC~TTTTCG~CGCCCTCGCTG~GGT~~aTGTC~GTCaGCCaaaGaTCGGCaGCaaC~aCaTCTC~aTaT IKQ1RLRUDPGFIFDALf4EUNUSQPKIGSNNISNM L R R G 5

5170

5180

5190

5200

5210

5220

5230

5240

5250

GAGCATCFlAGGCTTTCRGACCCCCaCaCG~TGaTGTTGCaaaaTTGCTCa~aGaTaTCaaCaC~aGC~aGCaC~aTCTTCCCaTTTCaGGGGGCaaTCTCGCCaaTT~TGa~aTCCaTGC SIKAFRPPHDDUAKLLKDINTSKHNLPISGGNLANYE

5290

5300

5310

5320

5330

5340

5350

5360

I

5370

H

a

5380

5390

5400

5500

5510

5520

5620

5630

5640

5740

5750

TTTCCGCAGFIFITCGGGTTGAFlCTCFITCTGCTTGCTaCaaaGCTGTTGaGaTaTCaaCaTTa~TTaGG~GaTGCCTTGaGCCaGGGGaGGaCGGCTTGTTCTTGGGTGaGGGaTCGGGTTC FRRIGLNSSACYKAUEISTLIRRCLEPGEDGLFLGEGSGS

5410

5420

5430

5440

5450

5460

5470

5480

5490

TATGTTGFlTCACTTATAAGGFIGFITFICTTaaaCT~aGC~aGTGCTTCTaTa~TaGTGGGGTTTCCGCCaaTTCTaGaTCTGGTCa~aGGG~aTTaGCaCCCTaTCCCTCCGa~TTGGCCT MLITYKEILKLSKCFYNSGUSANSRSGQRELAPYPSEUGL

5530

5540

5550

5560

5570

5580

5590

5600

5610

TGTCGaaCFlCFIGFIATGGGaGTaGGTa~T~TTGTC~a~GTGCTCTTTaaCGGGaGGCCCGa~GTCaCGTGGGT~GGCaGTGTaGaTTGCTTCa~TTTC~T~GTTaGTaaT~TCCCTaCCTC UEHRllGUGNIUKULFNGRPEUTWUGSUDCFNFIUSNIPTS

5650

5660

5670

5680

5690

5700

5710

5720

5730

TAGTGTGGGGTTTATCCATTCaGaTaTaGaGaCCTTGCCTGaCaa~GaTaCTaTaGaGaaGCTaGaGGaaTTGGC~GCC~TCTTaTCGaTGGCTCTGCTCCTGGGCaaa~TaGGaTCaaT SUGFIHSDIETLPDKDTIEKLEEL~~ILSllf3LLLGK

5770

5780

5790

5800

5810

5820

5830

5840

5760 I

5850

5860

5870 I

5980

5970

6020

6030

6040

6050

6060

6070

6080

6130

6140

6150

6160

6170

6180

6190

6200

S

I 6090

ATCCFITTAaGCAaCTaAGCTGCATACAAGC~aTTGTGGGaGaCGCaGTTaGTaGaGGTGaTaTCaaTCCTaCTCTGaaaaaaCTTaCaCCTaTaGaGCaGGTGCTGaTCaaTTGCGGGTT NPTLKKLTP IUGDAUSRGDI SIKQLSCIQA

6100

I

T

6000

5990

TG,,TC:~~~TTGGTT~~~~CaGaTc~~~~GGCTaa~~~~CTaaTG~~~~CTGaaa~~~~TaaGCa~~~~aTaaTTGaaTCaTCTGTGaGGaCTTCaCCTGGaCTTa~G~TC~CaTCCT ESSURTSPGL IKPPII ESYLUMTDLKANRLMNPEK

6010

S 5880

FlCTGGTGFITTFIFIGCTTATGCCTTTc~GCGGGG~TTTTGTTCaGGGaTTTaTaaGTTaTGTaGGGTCTCaTT~TaGaGaaGTGaaCCTTGTaTaCCCTaGaT~CaGCa~CTTCaTaTCT~c ISY’JGSHYREUNLUYPRYSNF LUIKLMPFSGDFUQGF

5960

G

L

6110

6120

6230

6240

6350

6360

IEQULINCGL

6210

6220

GGCFI~TTAACGG~CCTFIAGCTGTGCa~~G~~TTG~TCC~CC~TG~TGTTGCCTCaGGGC~aGaTGGaTTGCTTaaTTCTaTaCTCaTCCTCTaCaGGGaGTTGGCaaGaTTCaa~GaCaa GQDGLLNSILILYRELhRFKDN a' INGPKLCKELIHHDUAS

6250

6260

6270

6280

6290

6300

6310

6320

6330

6340

CCAA~G~~GTCAACAAGGGaTGTTCCACGCTTACCCCGT~TTGGTa~GTaGCaGGC~~CGaGaaCTTaTaTCT~GGaTCaCCCGCaaaTTTTGGGGGC~C~TTCTTCTTTaCTCCGGG~a TRKFWGHI QRSQQGMFHAYPULUSSRQRELISRI

6370

6380

6390

6400

6410

6420

6430

6440

LLYSGN

6450

6460

6470

CaGaaAGTTGaT~AaTA~GTTTATCCAGAATCTCAAGTCCGGCT~TCTGaT~CTaG~CTT~CaCC~GaaTaTCTTCGTT~~GaaTCTaTCCa~GTCaG~G~~aC~GaTTaTTaTGaCGGG IFUKNLSKSEKQI LDLHQN IQNLKSGYLI N K F R K L I

6490

6500

6510

6520

6530

6540

6550

6560

6618

6620

6630

6640

TGCCCTaGGTGGTTaGGCaTTATTTGCaaTaTfiTTaaaG * FIG.

P-Continued.

I

6570

GGGTTTGaaaCGTGAGTGGGTTTTTaaGGTaaCaGTC~~GGaGaCCa~~GaaTGGTaTa~GTTaGTCGGaTaCaGTGCCCTGaTTaaGGaCTaaTTGGTTGaaCTCCGGaaCCCTaaTCCI K D LUGYSAL GLKREWUFKUTUKETKEWYK

6400

6580 *

6590

II

T

G

6600

MEASLES VIRUS L GENE SEQUENCE

493

some regions of the L gene, even though they are substantially blocked. We find no indication of internal transcriptional start signals in these regions, nor have we found smaller transcripts from the L gene region (unpublished results). Moreover, the possible presence of genome (-) sense transcripts in RNAs extracted from MV-infected cells has been ruled out (Udem and Cook, 1984). Thus, barring the future demonstration of small functional proteins coded from internal regions of the MV L mRNA, we conclude that the only significant L gene product is the 247,611 -kDa protein. Predicted primary structure of the MV L protein The predicted amino acid sequence of the MV L protein is shown in Fig. 2 beneath the gene sequence. Its length of 2183 amino acids is in the middle of the range set by the L proteins of VSV (2109), Sendai (2048; Morgan and Rakestraw, 1986; or 2228, Shioda et a/., 1986), and NDV (2204), and its deduced amino acid composition is comparable to that of these viruses. In particular, its Leu + Ile content of 18.6% is more than 1.5 times that of an “average” protein (Dayhoff et a/., 1978). The MV L protein is predicted to be quite basic (pl = 9.45), on a par with that of VSV, and considerably more basic than the predicted L proteins of Sendai virus or NDV. The MV L protein contains 1 1 clusters of basic amino acids with a net charge of 2 +4, sug-

1234

5

FIG. 4. Coding potential of the MV L gene. The positions of nucleotides in the L gene or anti-L-gene are shown besidethe bands at the left and right, representing reading frames : -3 and 4-6, respectively. The center panel shows the coding potential of the six frames (numbered bars), calculated using a “C statistic” derived by the product method from the coding bias table of the L gene with a window of 50; values above M = 1.7 are considered significant.

123

gesting that some of these might represent domains involved in electrostatic interactions with the RNA genome. Uniquely among the sequences compared, the MV L protein displays a small hydrophobic region at its N-terminus. Homologies of MV L gene and protein

FIG. 3. Determination of the L mRNA 3’end by Sl nuclease mapping. Left panel, lane 1: Hinfl digest of clone 30 plasmid DNA, 3 end-labeled and redigested with Pstl. Lanes 2 and 3: labeled DNA annealed with 20 or 100 pg of CsCl pellet RNA from MV-infected HeLa cells and Sl nuclease treated. Lanes 4 and 5: labeled DNA anneaied with 1 rg of MV genomic RNA or 100 pg of CsCl pellet RNA from uninfected HeLa cells and Si nuclease treated. Middle panel: lanes 1,2,3, same as lanes 1,3,5 of left panel. Maxam-Gilbert sequence lanes are lettered. Right panel: expanded view of Maxam-Gilbert sequence showing L gene termination region (presented as minus-sense DNA).

When the predicted MV L protein was compared with that of Sendai virus by means of the IBI/Pustell homology algorithm (Pustell and Kafatos, 1982) a strong diagonal along most of the length of the proteins indicated that these proteins are linearly and very closely related. When MV was next compared with NDV by this technique, using the same matrix parameters, the overall homology was weaker than with Sendai, but still strong over large regions of the L proteins of these paramyxoviruses. However, when MV was compared with VSV, homologies along the diagonai were reduced to three regions of low percentage match (data not shown). These results demonstrate

494

BLUMBERG ET AL. A

220 230 240 250 * I I : KESOHVYlL7FELVLNYCDVlE6RLHlEI~lIDARYT MU Pet Froq lo 472 225 260 KltltqYiL7pELVLNVCDVvESR*nlsaRphlDkk~i SU 362 221 256 tnenkftcL7qELVLNVaDq&tdHvniiritlvhlr

NDU

232 216 255 7fkkldil~dpnfILNvkDVli6R~q7vl~(rcridnl

USU

t 8 F 4 t 8 t PVOVtNVWSSNVLHDPEfNLSVSLREKGl~I6RLFN~IV~~C6VlG~NLtS~~6K Pet Fran IO 671 51s 57s 451 518 57R

SU B

NDU

271 522 565 .

.

dliiqlkqkErqIKIr6RfCslHs*Kfpe7fVItE7Llkthfvp

730 740 750 a0 770 780 I I I k 1 8 MU KVPNDDlFIYYPH66IE6VCDKLU7lSIIP~VLGN~GS~Rl6SLVffiDND7INVIKRWS7 Pet Fran IO 571 724 7Sh

su

ND” 571 702 764 rVPNDdIylvrrr66IE61C~LH7qisI~4iqLNN~rSh~RvN~qVD6DNDvlNVI~~V~Sd 291 665 722 nrtrqpvc~qqqeGSIE6lrDK9Y7IlnllviqreAkirntrvkvlaD6DNDvlctqy 940 :

950 I

960 8

910 t

vso t

1190 1200 1210 1220 I230 I240 1250 12bO I270 I260 I t I I I t t t I 1 VNV6NFFVPS6CO~IOIEISSLAVPYIGSI7DSNlDNKLNFV~PSRSLRS6VRlN7VVSNNV~SSN~NNLL~OR6NVSLE~LRVlIPlSISI MU

“6”

VSU

Pet Froa lo 491 1187 1266 pivtYFylPdnidLOtltnqcpaiRiPlf6SaIOERsrrqLqyVRnlSkpdrAiRIAIVYtYAIGtDqiSH~EN~LiRqtRRNlSLEnLkIltPvSIST

S U

372 1165 lZb4 ~hftYFhlPSnieLtDdtrknpplRVP9lSSklqERrpprLhiSphvkaNlRa~~VliNRV6ODrlnNt~Altillk~R~Ni~LE7LRIlrPlpIa9

NDU

I290 I300 II10 1520 I330 1340 1350 1360 1370 1360 I t I I , 4 I t 1 8 NLRHRUIOR67PVKVS6ISLVRVRR777tSND~SFVlSDKKVOlNFl7OR~L6L6VLE7LFRL~KDl6SSNlVLHLHVEIDCCVIPNIDHPRlPSSR MU Y PC, cm. T” S U 41% I267 lJ6b NL~HRLkDtrl6~KfS~rtLVAarRfiIISNDffi~lkerqerk0lNlvVWil(Lt6LslfEln~RykKpqlqkp~iLHLHl~~q~i~~vpq~~~lPprr

_. -_ _

251 1265 1364 lUqHRLdDqilbtftpaSL7RChttf77~ilkq7~lttqstrqqnfinrvHLLGLsliEsiFp~tttrt7deitLHLH~kfvCCire~pv~vpfeliq

NDU

LPAPIGGNNVLNi,SRLFVAWIGOPVTSSIf@LKRHlLNSLNPEE7LHO MU 1390 I400 Ill0 I420 I430 I440 1450 1460 1470 I460 I 1 I L I I t 8 8 t KLELRRELCTNPLlVDNRPLlOADAlALYrOSHRAHLVEFVlNSl~VNILNKSlNLSHlDLVIKFGKDH6NEiS6LI~lNSF~lEFLLlEPRLFl MU

Pet Fro, lo 521 933 980 iPNnv66fNYtstSRcFVRNl6OPavaal6DLKRfIrNdLldkqvL7r SU Iv.

PII

956 tPRqlG6lmLqySRLptRNlGDPq~tafRriXRleavqLlspni~tn

251 E60 927 Ldpql6Gvrp~slSRfliR4fp0PvTeSlslrr(ihvhdr~ehlteqs

NDU USU

Pet Fran To 342 1387 1486 tLdLeitqenNtLlVDpdPLtDvDl~lfrtvrdvvHtVd~tyNSddevir~t~ic7NqtirDtrrqIdrD~ItEqiRLvnDDDvNSIITEF~vldvpLFr

S U

FIG. 5. (A) Comparison of predicted MV, Sendai virus (Shioda et al., 1986), NDV, and VSV L proteins, showing regions of highest sequence conservation. (B) GDD homologies in paramyxovirus L protein sequences.

that the L proteins are strongly conserved among paramyxoviruses, whereas MV had diverged considerably from VSV. This impression is reinforced by similar comparison of the L gene sequences, where good correlation on the diagonal was observed between MV, Sendai virus, and NDV, whereas correlation between MV and VSV was no stronger than random scatter (data not shown). Sequence homologies between MV, Sendai, NDV, and VSV L proteins are shown in Fig. 5A. Note that although the homologies with VSV are of modest degree, they are colinear with the paramyxovirus protein sequences, suggesting a direct evolutionary relationship between the L genes. Charged and hydrophobic amino acids are strongly conserved, and substitutions appear aimed at maintaining the physical properties of conserved blocks. In the region of strongest VSV homology, from amino acids 730-780 of MV, two blocks of amino acids GGIEGYCOKLWTI and LVQGDNQTI are conserved with precise spacing, suggesting that this extended region is vital for function. Noteworthy features of the first block are the paired glycine residues, echoed at positions 938-939, and the presence of the rare amino acid tryptophan, which may demark a surface region of the protein. MV L protein contains distinct elements of ancestral RNA polymerase In addition to the extensive sequence homology with other paramyxoviruses described above, the MV

L protein also contains regions of distinct homology to conserved sequences shown to be present in a number of RNA viruses and proposed to represent the nucleic acid recognition or active site of an ancestral RNA-dependent RNA polymerase (Kamer and Argos, 1984). The most characteristic feature of this conserved sequence is a central Gly-Asp-Asp (GDD) flanked by strongly hydrophobic regions. In the MV L protein there are two GDD sequences, at positions 3770-3779 and 4412-4420 of the L gene. Interestingly, each is augmented by a third D, and these GDDD sequences (underlined below and in Fig. 2) are flanked by predominately hydrophobic amino acids (the sequences respectively are IATVYSWAYGDDDSSWNEAWLLA and ISALIGDDDINSFITEFLLI). The MV L protein also contains an LDD sequence at positions 3614-3622. A comparison of this region of the paramyxovirus L proteins is shown in Fig. 56. The sequence surrounding the first GDDD of MV is very highly conserved in both Sendai and NDV, but in each case two of the D’s are altered. Interestingly, both alterations involve substitution of Glu for Asp, thus maintaining two negatively charged amino acids at the site of homology. In contrast, the surrounding hydrophobic sequence and the three D’s of the second GDDD of MV are preserved in Sendai virus, but there is no counterpart to this sequence in NDV. On the other hand, NDV and Sendai contain a strong homology to the LDD and

MEASLES VIRUS L GENE SEQUENCE

flanking hydrophobic sequences of MV, except that the D’s are now separated by an intervening amino acid. Note that these hydrophobic flanking sequences include three ttyptophan residues. Another striking feature is the conserved twin cysteine residues at positions 1369-1370 (asterisk) that could constitute an anchor in the L protein secondary structure for the putative active sites. DISCUSSION In this report, we describe the sequence determination of the MV L gene, deduce the primary structure of the MV L protein, and compare it with the deduced L proteins of two other paramyxoviruses, Sendai and NDV, and with that of VSV. The similar sizes and amino acid compositions of the L proteins, with Leu + Ile levels almost twice that of an “average” protein, as well as the strong homologies between the predicted paramyxovirus L proteins, suggest that the deduced MV L protein sequence is probably correct. To date, however, only the VSV L protein sequence has been directly confirmed, by construction and expression of a full-length clone (Meier et al., 1987; Schubert et al., 1985). The MV L protein exhibits one puzzling feature: its predicted N-terminal amino acid sequence is MetAsp. The initiator Met of cytoplasmic proteins is frequently removed in vivo (Wold, 1981) and, according to the “N-end rule” of Bachmair et al. (1986), proteins with N-terminal Asp would be predicted to have a remarkably short half-life of 3 min in the cytoplasm. Although a low cytoplasmic concentration of MV L protein may well be favorable for transcriptional activity (Meier et a/., 1987), there is otherwise no reason to expect that it would be especially unstable. As L protein function continues throughout the MV replication cycle, it is likely that the N-terminus of MV L protein is protected in some manner in vivo. Post-translational processing of the Sendai virus N and M proteins, and of the VSV N protein, has indeed been documented by direct analysis of the blocked N-termini of these cytoplasmic proteins (Rose et a/., 1984; Blumberg et al., 1984a, b). In each case, this processing involved cleavage of the initiator Met and N-acetylation of the following amino acid. As the mechanism proposed for protein destabilization involves preferential ubiquitination of certain N-termini (Bachmair et a/., 1986; Fried et a/., 1987), it is possible that N-terminal blockage, e.g., by acetylation, may confer stability by preventing ubiquitination. It is therefore tempting to propose that the N-terminus of the MV L protein may be processed to AC-ASP in viva;

495

however, Asp is not among those N-terminal amino acids frequently found acetylated (Weld, 1981). Alternatively, the initiator Met may not be removed from the MV L protein, perhaps due to folding at its hydrophobic N-terminus. The MV L gene and protein both exhibit very close homology with their Sendai virus counterparts, far stronger and more extensive than homologies reported between any other two genes or proteins of these paramyxoviruses. We also found considerable homology between MV and NDV, though to a lesser extent than with Sendai, and only very distant homology between MV and VSV L proteins. Interestingly, considerable homology was found between the L proteins of NDV and VSV (Yusoff er al., i 987), while very little homology was detected between Sendai and VSV. Taken in concert, these findings suggest an evolutionary pathway leading from VSV to NDV to Sendar to MV, but more sophisticated statistical analysis of the data will be needed. Interestingly, this pathway is consistent with a narrowing of the host range of these viruses over the course of evolution, and suggests that the evolution in paramyxoviruses of two separate glycoprotein genes from the G gene region of rhabdoviruses may have been part of this process. The deduced MV L protein shows distinct homologies with a highly conserved GDD sequence proposed to represent part of the active site of an ancestral RNA polymerase. However, the MV GDE homologies also differ significantly from those compiled (Kamer and Argos, 1984; Gustafson et al,, 1987) which represent mainly (+) strand RNA viruses and reverse transcriptases. The differences include the loss of the conserved upstream GxxxTxxxN sequence, extension of the GDD sequence to GDDD, duplication of the GDDD region, and divergence to an LDD region. These differences may relate to the profound divergence of replication strategy between the (+) RNA viruses, where the viral polymerase transcribes naked viral RNA, and the (-) RNA viruses, where the nucleocapsid structure is the obligatory template for the viral polymerase. Exact GOD sequences are not present in the L proteins of Sendai virus or NDV, although attention was called to weaker LDD and SDD sequence homologies (Shioda et al., 1986; Yusoff era/., 1987). However, the importance of pursuing these homologies ties in the likelihood that some of these sequences may cornprise part of the nucleotide recognition and/or polymerization site of the L proteins, or perhaps vestiges of such sites in the course of evolutionary change. For example, Fig. 5A shows an RDDD sequence with hydrophobic flanking sequences at positions 525-527 of the NDV L protein that have strongly conserved counterparts in both MV and Sendai virus, with net charge

496

BLUMBERG

of -2 maintained by Asp to Glu mutations. This GDD homology otherwise would not be obvious in the L proteins of MV or Sendai virus. Likewise, the highly conserved QGDNQ block shown in Fig. 5A may represent a GDD homology with altered charge. The active sites in the MV L protein remain undefined. However, despite their large size, the high degree of sequence conservation among paramyxovirus L proteins, due to their multifunctional nature, can be used to advantage in making structural comparisons. Using the more explicit homologies as a guide, as more data become available, it may be possible to decipher the functional rules of the active sites in negative-strand RNA virus polymerases. ACKNOWLEDGMENTS We thank Eytan Young, Barry Schanzer, Leyla Arik, Deborah Schuback, and Dominador Manalo for technical assistance, and Daniel Kolakofsky, Noel Tordo, Oliver Poch, Lydle Bougueleret, and Amiya Banerjee for helpful discusslons. This work was supported by grants from the Veterans Administration, the National Multiple Sclerosis Society, and the UMDNJ Foundation.

REFERENCES ALKHATIB, G., and BRIEDIS, D. J. (1986). The predicted primary structure of the measles virus hemagglutinin. I/iro/ogy 150, 479-490. BACHMAIR, A., FINLEY, D., and VARSHAVSKY,A. (1986). In VIVOhalf-life of a protein is a function of its amino-terminal residue. Science 234, 179-l 86. BELLINI, W. J., ENGLUND, G., RICHARDSON,C. D., ROZENBLATT,S., and i.AZZARINI, R. A. (1986). Matrix genes of measles virus and canine distemper virus: Cloning, nucleotide sequences, and deduced amino acid sequences. 1. Viral. 58, 408-416. BELLINI,W. J., ENGLUND, G., ROZENBLA~, S., ARNHEITER.H., and RICHARDSON, C. D. (1985). Measles virus P gene codes for two proteins. J. Viral. 53, 908-919. BIRD, A. P. (1980). DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8, 1499-l 504. BLUMBERG,B. M., GIORGI, C., KOLAKOFSKY,D.. ROSE, K., and KOCHER, H. (1984a). Preparation and analysis of the nucleocapsid proteins of vesicular stomatitis virus and Sendal virus, and analysis of the Sendai virus leader-NP gene region. 1. Gen. Viral. 65, 769-779. BLUMBERG,B. M., GIORGI, C., ROSE, K.. and KOLAKOFSKY.D. (1985a). Sequence determination of the Sendai virus fusion protein gene. I. Gen. Viral. 66, 317-331. BLUMBERG, B. M., GIORGI, C., Roux, L., RAJU, R., DOWLING, P. C., CHOLLET, A., and KOLAKOFSKY,0. (1985b). Sequence determination of the Sendai virus HN gene and its comparison to the influenza virus glycoproteins. Cell 41, 269-278. BLUMBERG, B. M.. ROSE, K.. SIMONA, L., Roux, L., GIORGI, C., and KOLAKOFSKY,D. (1984b). Analysis of the Sendai virus M gene and protein. J. Viral. 52, 656-663. CATTANEO, R., REBMANN, G., SCHMID, A., BACZKO, K., TER MEULEN. V., and BILLETER, M. A. (1987). Altered transcription of a defective measles virus genome derived from a diseased human brain. EMBO 1. 6, 681-688. CHEN, E. Y., and SEEBURG,P. H. (1985). Supercoil sequencing: A fast and simple method for sequencing plasmid DNA. DNA 4, 165-170.

ET AL CHOPPIN, P. W., and COMPANS, R. W. (1975). Reproduction of paramyxoviruses. In “Comprehensive Virology” (H. Fraenkel-Conrat and R. R. Wagner, Eds), Vol. 4. pp. 95-178. Plenum, New York. CROWLEY,J., DOWLING, P. C., MENONNA. J., SCHANZER, B., YOUNG, E., COOK, S. D., and BLUMBERG, B. M. (1987). Molecular cloning of 99% of measles virus genome, positive identification of 5’ end clones, and mapping of the L gene regions. lntervirology 28, 65-77. CROWLEY.J. C., DOWLING, P. C., MENONNA. J., SILVERMAN,J. I., SCHUBACK, D.. COOK, S. D., and BLUMBERG, B. M. (1988). Sequence variability and function of measles virus 3’ and 5’end and intercistronic regions. I/iro/ogy 164, 498-506. DAYHOFF, M. O., HUNT, L. T., and HURST-CALDERONE, S. (1978). Composition of proteins. ln “Atlas of Protein Sequence and Structure” (M. 0. Dayhoff, Ed.). Vol. 5, Suppl. 3, pp. 363-373. Natl. Biomed. Res. Found., Washington, DC. DOWLING, P. C., BLUMBERG,B. M., MENONNA, J., ADAMUS, 1. E., COOK, P., CROWLEY,J. C., KOLAKOFSKY.D., and COOK, S. D. (1986). Transcriptional map of the measles virus genome. 1. Gen. Viral. 67, 1987-1992. DOWLING, P. C., GIORGI, C., Roux, L., DETHLEFSEN, L. A., GALANTOWICZ, M. E., BLUMBERG,B. M.. and KOLAKOFSKY.D. (1983). Molecular cloning of the 3’.proximal third of Sendai virus genome. Proc. Nat/. Acad. Sci. USA 80, 52 13-5216. FRIED, V. A., SMITH, H. T., HILDEBRANDT, E., and WEINER, K. (1987). Ublquitin has intrinsic proteolytic activity: Implications for cellular regulation. Proc. Nat/. Acad. Sci. USA 84, 3685-3689. GIORGI, C., BLUMBERG, B. M., and KOLAKOFSKY,D. (1983). Sendai virus contains overlapping genes expressed from a single mRNA. Cell 35, 829-836. GRANTHAM, R. (1978). Viral, prokaryote and eukaryote genes contrasted by mRNA sequence index. FfBS Len. 95, l-l 1. GUSTAFSON,G., HUNTER, B., HANAU, R., ARMOUR, S. L., and JACKSON, A. 0. (1987). Nucleotide sequence and genetic organization of barley stripe mosiac virus RNA gamma. virology 158, 394-406. GUSTAFSON, T. L.. LIEVENS, A. W., BRUNELL, P. A., MOELLENBERG, R. G., CHRISTOPHER,B. S., BUTTERY. M. G., and SEHULSTER,L. M. (1987). Measles outbreak in a fully immunized secondary-school population. N. fngl. J. Med. 316, 771-774. HEGGENESS,M. H., SCHEID, A., and CHOPPIN, P. W. (1980). Conformation of the helical nucleocapsids of paramyxoviruses and vesicular stomatitis virus: Reversible coiling and uncoiling induced by changes in salt concentration. Proc. Nat/. Acad. Sci. USA 77, 2631-2635. IVERSON, L. E., and ROSE, J. K. (1981). Localized attenuation and discontinuous synthesis during vesicular stomatitis virus transcription. Cell 23, 477-484. KAMER, G., and ARGOS. P. (1984). Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Res. 12, 7269-7282. KINGSBURY,D. W. (1974). The molecular biology of paramyxoviruses. Med. Microbioi. Immunol. 160, 73-83. KINGSBURY,D. W., BRATT, M. A., CHOPPIN. P. W.. HANSON, R. P., HOSAKA. Y., TER MEULEN, V., NORRBY. E., PLOWRIGHT,W., Roar, R., and WUNNER, W. H. (1978). Paramyxoviridae. lntervirology 10, 137-152. KORNELUK, R. G., QUAN. F., and GRAVEL, R. A. (1985). Rapid and reliable dideoxy sequencing of double-stranded DNA. Gene 40, 317-323. KOZAK. M. (1984). Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. Nucleic Acids Res. 12, 857-872. KODAK, M. (1986). Point mutations define a sequence flanking the

MEASLES VIRUS L GENE SEQUENCE AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283-292. LEPPERI, M., RITTENHOUSE, I., PERRAULT, J., SUMMERS, D. F., and KOLAKOFSKY,D. (1979). Plus- and minus-strand leader RNAs in negative strand virus infected cells. Cell 18, 735-747. MPXAM, A. M., and GILBERT, W. (1980). Sequencing end-labelled DNA with base-specific chemical cleavages. In “Methods in Enzymology” (L. Grossman and K. Moldave, Eds.), Vol. 65, pp. 499-560. Academic Press, New York. MEIER, E., HARMISON, G. G., and SCHUBERT, M. (1987). Homotypic and heterotypic exclusion of vesicular stomatitis virus replication by high levels of recombinant polymerase protein L. Virology 61, 3133-3142. MORGAN, E. M., and RAKESTRAW,K. M. (1986). Sequence of the Sendai virus L gene: Open reading frames upstream of the main coding region suggest that the gene may be polycistronic. Virology 154,31-40. MORGAN, E. M., and RAPP, F. (1977). Measles virus and its associated diseases. Bacferiol. Rev. 41, 636-666. NEUBERT,W. J., and KOCH, E. M. (1985). Molecular clones representing Sendai virus L-gene: Sequence of 1,179 nucleotides from the 3’ end of the L-gene. Zentralbl. Bakteriol. Microbial. Hyg. B 260,

498-499. PUSTELL,J., and KAFATOS, F. C. (1982). A high speed, high capacity homology matrix: Zooming through SV40 and polyoma. Nucleic

Acids Res. 10,4765-4782. PUSTELL,J., and KAFATOS, F. C. (1984). A convenient end adaptable package of computer programs for DNA and protein sequence management, analysis and homology determination. Nucleic

Acids Res. 12, 643-655. RICHARDSON,C., HULL, D., GREER, P., HASEL, K., BERKOVICH,A., ENGLUND, G., BELLINI, W., RIMA, B., and LAZZARINI, R. (1986). The nucleotide sequence of the mRNA encoding the fusion protein of measles virus (Edmonston strain): A comparison of fusion proteins from several different paramyxoviruses. virology 155, 508-523. RIMA, B. K., BACZKO, K., CLARKE, D. K., CURRAN, M. D., MARTIN, S. J., BILLETER, M. A., and TER MEULEN, V. (1986). Characterization of clones for the sixth (L) gene and a transcriptional map for morbilliviruses. 1. Gen. Vifol. 67, 1971-1978. ROSE, K., KOCHER, H. P., BLUMBERG, B. M., and KOLAKOFSKY, D.

497

(1984). An improved procedure, involving mass spectrometry, for N-terminal amino acid sequence determination of proteins which are Na-blocked. Biochem. J. 217, 253-257. ROZENBLATT,S., EIZENBERG,O., BEN-LEVY, R., I.AVIE, V., and BELLINI, W. J. (1985). Sequence homology within the morbilliviruses. I. Vii-o/. 53, 684-690. SCHUBERT, M., HARMISON, G. G., and MEIER, E. (1984). Primary structure of the vesicular stomatitis virus polymerase (L) gene: Evidence for a high frequency of mutations. J. Viral. 51, 505-514. SCHUBERT, M., HARMISON, G. G., RICHARDSON,C. D., and MEIER, E. (1985). Expression of a cDNA encoding a functional 241-kilodalton vesicular stomatitis virus RNA polymerase. Proc. Nati. Acad.

Sci. USA 82,7984-7988. SHIODA, T., IWASAKI,K., and SHIBUTA, l-l. (1986). Determination of the complete nucleotide sequence of the sendai virus genome RNA and the predicted amino acid sequences of the F, HN and L proteins. Nucleic Acids Res. 4, 7545-l 563. UDEM, S. A. (1984). Measles virus: Conditions for the propagation and purification of infectious virus in high yield. J. Viral. Methods 8, 123-136. UDEM, S. E., and COOK, K. A. (1984). Isolation and characterization of measles virus intracellular nucleocapsid RNA. /. viral. 49, 57-65. WALLACE, R. B., JOHNSON, M. J., SUGGS, S. V., M~MSHI, K., BHA~, R., and ITAKURA, K. (1981). A set of synthetic oligodeoxyribonucleotide primers for DNA sequencing in the plasmid vector pBR322.

Gene 16, 21-26. WILD, T. F. (1985). Measles vaccination--Is a new strategy needed in Third World countries? Vaccine 3, 282. WOLD, F. (1981). ln viva chemical modification of proteins (posttranslational modification). Annu. Rev. B&hem. 50, 783-814. YOSHIKAWA,Y., MIZUMOTO, K., and YAMANOLICI-II,K. (1986). Characterization of messenger RNAs of measles virus. 1. Gen. Viral, 67, 2807-2812. YUSOFF, K., MILLAR, N. S., CHAMBERS, P., and EIVIMERSON, P. T. (1987). Nucleotide sequence analysis of the L gene of Newcast!e disease virus: Homologies with sendai and vesicular stomatitis viruses. Nucleic Acids Res. 15, 3961-3976,