Heterogeneity and 5′-terminal structures of the late RNAs of simian virus 40

Heterogeneity and 5′-terminal structures of the late RNAs of simian virus 40

J. Mol. Biol. (1978) 126, 813-846 Heterogeneity P. K. GHOSH, and S-Terminal Structures of the Late RNAs of Simian Virus 40 V. B. REDDY, J. SWINSCO...

5MB Sizes 0 Downloads 12 Views

J. Mol. Biol. (1978) 126, 813-846

Heterogeneity P. K.

GHOSH,

and S-Terminal Structures of the Late RNAs of Simian Virus 40 V. B. REDDY,

J. SWINSCOE,

P. LEBOWITZ

AND

S. M.

WEISSMAN

Departmentsof Internal Medicine and Human Genetics Yale Medical School, New Haven Corm. 06510, U.S.A. (Recieved 15 June 1978, and in revisedform

20 July 1978)

We have investigated the 5’-terminal structures of the late lytic RNAs of simian virus 40 (SV40) by binding the plus strand of small DNA fragments labeled in the 5’-terminal position with 32P to specific regions of cytoplasmic polyadenylated late RNA, extending these “primer” fragments in a 3’ direction with reverse transcriptase, fractionating the extended products on denaturing polyacrylamide gels and performing DNA sequence analyses on the extended products. Extension of a primer bound to the 5’ terminus of the body of 19 S RNA revealed a multiplicity of extended products. RNAs from which these products are derived fall into four classes, the first containing sequences colinear with SV40 DNA and the latter three containing splices which fuse the 5’ terminus of the 19 S body at residue 476 (0.765 map units) to residues 444, 291 and 212, respectively. Of 15 extended products we have analyzed, eight have 3’ termini with the sequence T-A(A). These termini lie at positions 243, 182, 110, 55, and 5189 on the SV40 genome. A number of lines of evidence, including nucleotide sequence analysis of late RNA labeled in wivo and the fact that the principal sequence in SV40 late mRNAs adjacent to capped structures is A-U(U), suggest t,hat the termini at positions 243 and 182 correspond to the 5’ termini of discrete in wivo 19 S RNAs and that stops at the shorter positions may also correspond to the 5’ termini of specific in. viva species of 19 S RNA. An additional extended product was observed with a sequence colinear with SV40 DNA and a 3’ terminus of T-A-A at residue 548 (0.779 map units). It contains sequences complementary to the VP3 but not the VP2 initiation codons of 19 S RNA. For each of the three different gaps we have found in the late 19 S RNAs, as well as four additional gaps we have analyzed in the late 16 S and early 19 S RNAs of SV40, identical di-, trior tetranucleotide sequences lie on the ungapped precursor at sites which undergo splicing: these sequences may be involved in determining the specificity of thr splicing reaction,

1. Introduction The simian virus 40 (SV40) genome is divided into two distinct template regions. The early gene region occupies approximately 50% of the viral minus strand from O-67 to O-165 map units as shown in Figure 1 (Khoury et al., 1972,1975; Lindstrom & Dulbecco, 1972; Sambrook et al., 1972; Dhar et al., 1974,1975,1977a,b). Throughout lytic infectioqit is transcribed in a counterclockwise direction’(Khouryet al.. 1973: Sambrook et al., 1973) into a minimum of three distinct 19 S mRNAs (Berk & Sharp. 1978:Crawford et al., 1978;Reddyet al, 1978b;Reddy,V.B.,Ghosh,P.K.. Swinscoe, ?J..

814

P.

K.

GHOSH

ET

AL. 9s

mRNA

leader

region

unspl~ced early RNA

y$& I antigen mRNA

Non-saliced RNA

0.153

F;;;f

19 S late

late 16 S

FIG. 1. Map of the SV40 genome demonstrating the DNA fragments derived by digestion with the Hind11 and Hind111 restriction endonucleases, the genomic t,emplatos for 3 early 19 S RNAS, the body and leader of the principal 16 S late mRN.4, the putative 18.5 8 late RNA and the body and leader region of a multiplicit,y of 19 S RNA species. Also shown are the initiation (v ) and termination (v ) sites on these RNAs for synthesis of early T and t antigens and the VPl, VP2 and VP3 capsid proteins. Map units given in decimals reprosent fractional genomic lengths from the unique EcoRI restriction endonuclease cleavage site within Hi?tdIl, HindIII-F. Tangentially oriented arrowheads indicate the directions of transcription and 3’ ends of early and late RNAs.

Lebowitz, P. and Weissman, S. M., manuscript in preparation) which serve as the templates for the 90,000 to 94,000 molecular weight’ A prot’ein or large T antigen, the 17,000 to 19,000 molecular weight small t antigen (Prives et al., 1975,1977; Carrol & Smith, 1976; Crawford et al., 1978) and possibly other early proteins. The late gene region resides on 50% of the plus DNA strand from approximately 0.67 to 0.170 map units (see Fig. 1) (Khoury et al., 1972,1975; Lindstrom & Dulbecco, 1972; Sambrook et al., 1972; Dhar et al., 1974,1975,1977a). Following the initiat’ion of DNA replication, the late region is transcribed in a clockwise direcbion into stable mRNAs of two size classes, 16 S and 19 S (Weinberg et al., 1972,1974; Al oni. 1974), which in turn code for synthesis of three late viral capsid proteins designated VPl. VP2 and VP3 (Prives et al., 1974a,b,1975; A. Smith, personal communicat,ion). It has recently been suggested from experiments with cells infected with non-leaky SV40 tsA mutants and transcriptional complexes isolated from such cells that, the lat,e gene region is also transcribed prior to the onset of DNA replication (Ferdinand et al., 1977; Khoury & May, 1977). However, if this is the case in viva, late transcripts must be degraded rapidly, since stable plus strand transcripts do not accumulate in lytic infection prior to the onset of DNA synthesis. Nucleic acid sequencing studies performed on polyadenplated cytoplasmic mRNAs harvested late in lytic infection suggest’ed to us several years ago that transcription of the late region is extremely complex (Dhar et aZ., 1974,1975,1977a). Specifically, we found that transcripts from the 0.72 to 0.76 and 0.77 to 0.98 map unit regions

STRUCTURE

OF

SV40

LATE:

RNAS

81.5

were plentiful, while transcripts from the intervening region from 0.76 to 0.77 map units were essentially missing from cytoplasmic mRNA, and transcripts from the O-67 to 0.72 map unit region were present, in limited quantities. From bhese data. it was suggested that, the bulk of the transcripts from the 0.72 to 0.76 map unit region was not linked to transcripts from the O-77 t,o 0.98 map unit region. The region from 0.721 to 0.760 map units has now been shown to be 203 nucleotides in length (Ghosh it al.. 1978) and to constitute the principal 5’ leader sequence of late 16 S mRS;\ (Aloni rt al.. 1977; Celma et al., 1977aJ; Dhar et al., 1977: Hsu &Ford, 1977; Lavi & Groner. 1977 ; Ghosh et al., 1978). The main segment of this RNA. to which t’he leader is att,ached by a 3’: 5’ phosphodiester bond (Ghosh et al., 1978) is 1212 nucleot’ides ill length and derived from the genomic region from 0.939 to C)*l70 map unitIs (Dhar d al.. 1974,1975; Khoury et al., 1976; May et al., 1976,1977). The body of 16 S mRKA cont,ains 362 nucleotide triplets from 0.947 to 0.155 map units, which code for \‘PI prot,c!in (Van de Voorde Pt al., 1974; Pan et al., 1977) and 16 S mRNd directs thus synt,hesis of VP1 in an ir/, vitro protein-synthesizing system (Prives Pt al., 1975). It8 is also knox4.n that, the main body of late 19 8 mRNA is transcribed from th(x 0.76 t,o 0.17 map unit, region of the viral plus strand (Weinberg CysNewbold. 1974: Khoury et aZ., 1976; May et al., 1977; Thimmappaya et al.. 1978). Studies with ST’40 deletion mutants wit,hin this genomic region (Cole it al.. 1977) and examination of trypt,ic peptide maps of VP2 and VP3 (Gibson, 1975) have togetjher suggested that VP2 is c~ncodrd in 19 S mKNA from approximately 0.77 to WB(i map units and that \‘P3 is ~ncotled by the same sequence of nucleotides specifying the carboxy-t~erminal amino acids of VP%. In fact, the body of t’he 19 S mR?iA contains an initiation ;1(‘(: codon near its 5’ terminus followed by 355 consecutive coding t.riplets and tjhen i\ tcarminator C’AA t’hat have been suggested as the coding region for VP2 (Retldy et ul.. 1978; Fiers et ab., 1978). The 118th codon within this stretch of sells? codons is another AU(: triplet. which has been suggested as the initiation codou for VP3 synthesis. I9 S mRX’X dir&s synthesis in z&o of a polypeptide related to VP2 but it, may not dirrct synthesis of VP3 (Prives et al., 1974a.h). Since eucaryot’ic mRNAs behave as rnonocistronic~ messages. i.e. serve as templates for translation of only one polypcptjid(L ohaiu. and since ribosomes usually initiate translabion at the AUG initiation scquenct’ closest to th(b 5’ &minus of a message. the quest,ion arises as to I\+cthrr an InRh’A shorter t,han 19 S mRNh and lacking t,hc VP2 initiation codon ma-v serve’ as the t~cmplatc~ for VP3. 1 n SV40, such an RNA has never been recognized, but in the relabtd virus, polyoma, an 18 S mRNA coding in an in vitro protein-synthetic systerri for \‘I?3 has been separated from 19 S mRNS (W. Gibson. personal communicatjion : .A. Smit Ii, personal coniniunicatjion). WC have been studying t~he &ruct’ure of the late SV40 RN& by annealing the plus st~rand of small DNA fragmcnt,s with 32P in a 5’-t)erminal position to various regions of the late SV40 RNA4s, extending these bound fragments in a 3’ direction (5’ with respect to tctnplate RSA) with reverse transcriptase, separating the cxtendctl products on polyacrylamidc gels and performing nucleotide sequence analyses OII t hc st:parated exteuded products. Comparison of the nucleotide sequences of extentlecl products \vit h that of S\r40 DNA establishes whether the tetnplates for individual produ&s are colinear or non-colinear copies of viral DNA. may give information on t h(, 5’ &mini of RNA templates, and overall provides insight into the het~erogrnc~it~~ of a population of RCA molecules. This paptAr is couct:rnt~d primarily with analysis of the product,s ohtaincld 1)~

816

P.

K.

GHOSH

ET

AL.

extension of primers bound to cytoplasmic late RNA at the 5’ terminus of the body of 19 S RNA. Specifically, we have detected four categories of extension product’s, indicative of four different classes of late 19 S RNA. One class of RNA is colinear with SV4O DNA at its 5’ terminus, while three classes contain splices which fuse residue 476 at the 5’ terminus of the body of this RNA to residues 444, 291 or 212. The most abundant 19 S species contain the splice from residues 291 to 476. We have also found that the extension products terminate at a number of different sites. Multiple species stop at residues 182 and 243 with the sequence T-A-A (the principal sequences adjacent to caps in SV40 late mRNA is the complement, A-U-U (Lavi & Shatkin, 1975; Groner et al., 1977; Haegeman & Piers, 1978a)) and there is reason to believe that these sites probably represent the 5’ termini of in ,vivo species of 19 S RNAs. The available evidence also suggests that extended products st,opping at other sites with the sequences T-A(A) and T-A as well as certain other sequences may also arise from discrete in viva RNAs but additional evidence will be required to prove this contention. One 19 S RNA leader sequence extending from residues 243 to 444 is identical to the leader of the principal 16 S late RNA (Ghosh et al., 1978; Haegeman & Piers, 19783). We cannot be certain at this point which of the multiple late RNA species we have described serve as intracellular mRNAs; to simplify discussion in this paper, we refer to all labe RNAs, both gapped and ungapped, simply as RNAs.

2. Materials (a) SV40

DNA

preparation

and

and Methods generation

of restricted

DNA

fragments

SV40 (strain 776) was grown in the Vero line of African green monkey cells and unlabeled viral DNA I purified as previously described (Zain et al., 1973). Restriction enzymes from Escherchia coli carrying RI1 factor (EcoRII), Hemophilus aegyptius (HaeIII), Arthrobacter ZU&?W (Alu) and Awabaena variabilis (AvaII) were purchased from New England Biolabs or Bethesda Research Labs, Inc. Conditions for digestion of SV40 DNA I with the various enzymes were those specified by these laboratories. Digests of DNA were electrophoresed on 4 to 8% polyacrylamide gels as previously described (Danna & Nathans, 1971) to separate the resultant fragments. Following individual fragments were recovered staining of the gels with et.hidium bromide (1 pg/ml), from gels by excision, homogenization in 2 to 3 ml of 0.1 x SSC (SSC is 0.15 M-NaCl, 0.01 M-sodium citrate, pH 7.0) at low speed in a Tekmar homogenizer, centrifugation at 20,000 revs/min in the Sorvall SS34 rotor to pellet acrylamide and precipitation of DNA with 2 vol. ethanol. Three specific restriction endonuclease fragments were prepared from SV40 DNA I. The EcoRII-N fragment was obtained by l-step digestion of DNA with the EcoRII enzyme. This enzyme cleaves DNA in a staggered fashion with specificity for the sequence 4 C-CkG-G

(Bigger

et al.,

1973),

where

the

sequence

is read

in a 5’ to 3’ direction

and

the

arrow indicates the site of cleavage. Thus, as shown in Fig. 2, the plus strand of the EcoRII-N fragment extends from residues 100 to 154 on the map of SV40 (Reddy et al., 19786) (0.693 to 0.704 map units). A second fragment extending from 0.803 to 0.810 map units was prepared by first digesting viral DNA with the HaeIII endonuclease and isolating the HaeIII-J fragment, extending from 0.803 to 0.838 map units, and then cleaving this fragment with the AZu enzyme and isolating the middle-sized of the 3 resultant fragments on an 8% polyacrylamide gel. The AZu enzyme cleaves HaeIII-J at 0.810 and O-813 map units; thus the middle-sized fragment must extend from 0.803 to 0.810 map units. The HaeIII and AZu enzymes cleave at G-G ,l C-C (Murray & Old, 1974) respectively, thus permitting assignment and A-G 4 C-T sequences (Roberts et al., 1976), of the expanse of the O-803 to 0.810 map unit fragment from residues 676 to 711 on thr

STRUCTURE

OF

SV40

LATE

RNAS

x17

map of SV40 DNA. The third fragment extending from 0.766 to 0.772 map units was obtained by digesting SV40 DNA with the Alu enzyme, isolating the Alu-K fragment (0.743 to 0.777 map units), and redigesting it with the AwaII enzyme. This enzyme cuts double-stranded DNAs in a staggered fashion to yield 5’-terminal G-A-C and Q-T-(! sckquences (Murray et aE., 1976) and its cleavage of A&u-K yields 3 fragments, the smallest, of which contains a, plus st,rand extending from 0.766 to 0.772 map units (residues 479 to 509 in Fig. 2). All 3 fragments of SV40 DNA were labeled at their 5’ termini by incubation with jy-32P]ATP (4000 t,o 5500 Ci/mmol; New England Nuclear) and polynucleot,idc kinase (Miles (‘I).) AS described elsewhere (Reddy et al., 1978a). (b)

Preparation

of

late lytic RNAs

For isolation of late lytic RNAs, Vero cells were grown to confluence in 32 oz culturr bottles or 750 cm2 roller bottles in Earle’s minimal essential medium containing 10% fetal calf serum, 2 rnAx-glutamine, 250 IU pencillin/ml and 250 rg streptomycin/ml. Following tiecant,ation of this medium and rinsing with Earle’s normal saline, cells were infected \vit,h 20 to 30 plaque-forming units of SV40/cell in this same medium containing 2g;, heat-inactivated agammaglobulinemic calf serum in place of 10% fetal calf serum. At, 48 h post-infection cells were harvested by treatment with trypsin-EDTA, pelletted and washed twice with Earle’s normal saline. Immediately after harvesting, infected cells were fractionated into nuclei and cyt’ol)lasm and RNA was extracted from the cytoplasmic fraction as described by Penman (1966) with 2 modifications. First cells were ruptured by 6 forced expulsions through a 26.gauge steel hypodermic needle as well as 10 strokes in a Dounce homogenizer; and sttcondly, the cytoplasmic fraction was digested with Proteinase-K (Merck), 500 pg/ml. 111 0.01 N-TriseHCl (pH 7.4), O-01 M-NaCl, 0.003 M-MgCl,, 0.5% sodium dodecyl sulfate and 1 mM-EDTA for 2 h at 37°C prior to extraction of RNA with phenol and chloroform/ isoamyl alcohol. The former modification resulted in lOOo/o rupture of cells and the latt#el in reduction in the quantity of aggregated protein at the phenol-aqueous interface during RNA extraction. Then O-1 vol. 20% potassium acetate (pH 5.4) and 2 vol. ethanol werP added to the a,queous phases to precipitate RNA. Precipitated RNA was taken up in 0.5 M-KC& 0.01 M-Tris*HCl (pH 7.5) and charged onto columns of oligo(dT)-cellulose irr tllis same buffer. Polyadenylated RNAs retained by these columns were eluted with 0.01 nr-TrisaHCl (pH 7.5) and water (Aviv & Leder, 1972) and reprecipitated with I)otassium acetate and ethanol.

(c) Hybridization

of

DNA

fragrnenti

to late RNA

and primer

extension

DNA fragments labeled in the 5’-terminal positions with 32P were annealed to lattx cytoplasmic polyadenylated RNAs by the procedure of Casey & Davidson (1977). From X) t,o 50 &I of each DNA fragment representing from 0.75 to 1.5 pg of DNA were taken up in 150 ~1 of 80% formamide (Eastman, deionized by stirring overnight with Biorad Ag501 x 8 mixed bed resin), O-4 M-Nacl, 0.01 M-PIPES (piperazine N-Ar-bis (2-ethanc sulfonic acid)), pH 6.4, and held at 100°C for 5 min to denature DNA. Late cytoplasmic .HNA, 200 t,o 600 pg in 150 ~1 of this same buffer, was then added to each denatured DNA fragment and the hybridization mixture brought to 85°C for 5 min and then incubated o\-ernight at 50°C. At the conclusion of the annealing reaction, each reaction mixture was diluted lo-fold with 0.5 M-KCl, 0.01 M-Tris.HCl (pH 7.5) and chromatographed on oligo(dT)-cellulose as noted above. Duplex molecules composed of the 32P-labeled plus st>ra.nds of DNA fragments and poly(A) terminal viral RNA, eluting with 0.01 M-Tris’HCl and water, were precipitated with 2 vol. ethanol in the presence of 0.3 M-sodium acetatcl. With thr~ RNAs of the hybrid molecules serving as templates and the bound plus strands of radiolabeled DNA fragments as primers, reverse transcriptase-catalyzed primer extension reactions were next carried out. In the extensions, primers are extended ill a 5’ to 3’ direction by the sequential addition of deoxynucleoside monophosphates. Ext,ension reactions contained, in a total volume of 200 & 50 mM-TrissHCl (pH &3), 6 mlvr-magnesium acetate, 60 mM-NaCl, 10 mnn-dithiothreitol, 1 mM each of unlabeletl tl(lTP. dTTP, dGTP and dATP, 5 units of avian myeloblastosis virus reverse transcript,ascL

5’ RNA

_-

---.---

---

e

------

----, r-----__

----------3,

(0.720) II-15 I6 ’ y..f~ s ’

FIG. 2. A combined SV40 plus and complementary RNA seqmnce to the 5’ termini of the late HNAs.

(0.695)

to.7301

strand DNA sequence (lower (upper line) over R st,retch The DNA sequence is taken

line in the nucleotide sequence) of t,he late gene region giving rise from Keddy et al. (1978b). The

STRUCTURE

OF

SV40

LATE

RNAS

Xl9

(a generous gift from Dr J. Beard) and DNA-RNA hybrid molecules prepared from 0.75 to 1.5 pg of DNA. Reverse transcriptase was either used directly as obtained or first chromatographed ou a column of Biogel P200 in 0.2 M-KPO, (pH 7.4), 2 mm-dithiothreitol, 0.27& Triton X100 and 10% glycerol to remove traces of RNase which were present, in c*crtain preparations of the enzyme (P. Berg & J. Wahl, personal communication). I+:actions wart? carried out at 41°C for 3 11, after which NaOH was added to a final conceitt ratioir of 0.X JI and t8hc incitbations continued for another liour t,o &grade the, RNA tc~mplatcs. Polltrwing iicut,ralization wit,h 1 M-HCl and addition of sodium dodecyl su1fat.t’ to 6.5”;,. tlio roactiou mixtures were extracted twice with water-saturated phenol, after w hich ~‘2P-lwhrletl oxtended DNA primers were precipitated with et,lranol and sodiiitn arctatc~ HS llotcd al)o~c~. (d) Analyses

of extended

primers

Extc~rrtieti primers were fractionated on 8% polyacrylamide slab gels contaiuinp 7 XI~IPC~ in the Tris/EDTA/horate buffer system of Peacock & Dingman (1968). These gels were 40 cm in length, 16 cm in width and 2 mm in thickness. They were run at 400 to 800 V until marker bromphenol blue dye reached the bottom of the slab. Wet gels were promptly ;l,lltoradiographed and individual extended products excised and recovered as noted abovtl. Ext,ended products were then subjected to nucleotide sequence analyses by the method of Maxam & Gilbert (1977), size analysis by re-electrophoresis on polyacrylamide/“r YIcirca gels with single-stranded SV40 DNA fragments of known length serving as size, riiarkers and/or susceptibility to digestion with HueIII restriction enzyme followed 1,) size analysis of the resultant radiolabeled 5’-terminal fragments. The nucleotide sequences of the extended primers were compared with the known sequence of SV40 DNA (Red(l) ut al., 19785) (Fig. 2) to determine whether the template RNAs were colinear copies of SV40 DNA or had undergone splicing and fusion of RNA segmems.

3. Results (a) Extension of the 0.765 to O-772 and O-693 to 0.704

rrmp

unit primers

Figure 3(b) shows the 8% polyacrylamide/7 M-urea gel electrophoretic pattern of radiolabeled products obtained when the 0*7&T to O-772 map unit AvaII fragment of SV40 DNA, complementary to the 5’ terminus of the body of late 19 S RNA, was annealed to polyadenylated cytoplasmic late RNA and extended in a 3’ direction in the presence of unlabeled deoxynucleoside triphosphates and reverse transcriptase. Virually identical gel patterns have been obtained with reverse transcriptase used directly as provided by Dr J. Beard and with enzyme passed through columns of Hiogel P200 to remove small quantities of RNase found in certain enzyme prepartions. It is possible to identify over 20 individual radiolabeled extended products, most of which are of narrow width and appear to be discrete products. The presence of multiple extension products in this experiment was an unanticipated result,, since 19 S late RNA had been thought to be either a unique species polarities of the DNA and RNA strands are 3’ to 5’ in left to right and right to left directions, reapect.ively. Both DNA residue numbers and map units are provided for reference purposes. :\rrows within the sequence mark sites of cleavage with HaeIII and AwaII restriction enzymes relevant to presentation of results. Horizontal lines over the sequence represent the span of the I)NA sequence represent,ed in specific extension product,s of late 16 8 and 19 8 RNAs; extension products synthesized on the 0.765 to 0.772 map unit primer bound to the body of 19 S RNA have been numbered in accord with numbering of the gel bands of Fig. 3(b). Broken lines indicate termini of products which are not precisely determined. Boxes in the RNA sequence enclose sequences of nucleotides which are duplicated at the 3’ termini of leader segments and the 5’ termini of RNA bodies. Ambiguities in assigning the precise termini of leaders and bodies created by this duplication have been arbitrarily solved by assuming that splicing cleavages occur through I he G-G and A-G dinucleotjides within these duplicated sequences. Hyphens have been omitt,ed I’or clttrity.

820

P.

K.

ET AL.

GHOSH

-2

H8l?IE-0 (3721

.4 -5 6 7(267)

9(196)

(1643

-14

l6(63)

(0)

(b)

Fm. 3. Autoradiograms of the 8% polyacrylamide/‘l M-urea gel electrophoretio separations of the 6’ 3aP-labeled products obtained by reverse transcriptase catalyzed elongation of the (a) 0.803 to 0.810 map unit HaeIII-AZu and (b) 0.766 to 0.772 map unit (residues 479 to 609 in Fig. 2) AvaII fragments of SV40 DNA annealed to late polyadenylated cytoplasmic RNA. The 2 separations were carried out in adjacent channels of the same gel. The origin and front of the (b) channel electrophoretogram were inadvertently cut from the portion of the autoradiogram presented. The position of the marker HaeIII-D fragment was obtained from its coelectrophoresis with bands 2, 3 and 4 from the (b) channel on a separate gel. In parentheses are given the nucleotide lengths of bands of known size (see Results for size determinations).

STRUCTURE

OF

SV40

LATE

HK\‘.AS

S2l

of RNA or composed of a limited number of RNA species. In order bo assclss tlrcb significance of this result, we proceeded to perform nucleotide sequence analyses on 15 of’ tshe extended products shown in Figure 3(b) and to perform an identical (lxtension with late polyadenylated nuclear RNA as a template. The nucleotide sequcnct~ tlata. \vhich are discussed in detail below, revealed each product to contain a uniquci ~cqu~~nce of SV40 DNA and demonstrated an absence of minor skips or duplicat,ions of nnclc~ot,ides. The latter finding establishes the fidelit,y of chain elongat,ion by ~CWIW transcript,asc in our transcription system. The results of t~hr extension on tmcl(a:1r WA I\ ith the 0.765 to 0.772 map unit primer are presented in Figure 4. The nuclra~~ pattern is entireI> different from that of cytoplasm. It exhibits one principal protluct accounting for approximately 80 bo 90% of total extended products and migrating in the region of the gel comparable to cytoplasmic band 7, scLvera1 additional products of grc,ater chain length? and minimal quantities of product,s migrating with mobilit i(xs ot thr shorter cytoplasmic products, 8 to 16. Analysis of t,hc prominent, nuclear extension products shotl-cd that t,hey cont,ainctI scquf’nces colinear with SV40 DNA. The sequence of cvtoplasmic product 9 is also colinear \I-ith SV40 DNA (see below). Since the cytoplasmic and nuclear RKAs \vere (5xtract,rd and subjected to reverse transcription in an ident,ical manner, t,he absence; ot virtual abscncr of product 9 in t,he nuclear extension suggests that it is not derirccl in tht: cytoplasmic extension by either non-specific termination by revcarse transcript)ascz or as a result of degradation of RNA molecules during extraction or transcription. .Itlditional data. discussed below, concerned with 3’-terminal sequences of’ tlrcx YStend& products \vt: have obtained and nucleotidc sequence analyses of 19 S latta R,S;\ labelrd it, z:i~ also suggest that our principal reverse transcript.asc st,ops art’ not the, rclsult, of non-specific premature terminations or Rh’A degradation. The fact that sucrost’ gradient pat,terns of extracted cytoplasmic RNA show typical peaks of 16 S and 19 S RSA (unpublished results) indicate that our cytoplasmic ext,racts do not consist’ of heavily degraded RNA: and our finding that Rllr’A extracted from polo-somcs exhibits an extension pattern with the same bands (in a qualit*ative SWS(~) its t.host derived by extension of whole cytoplasmic RNA suggests that the KK\;.\s starving as transcriptional t’emplates are derived from living cells. The st,ructural analyses performed on bands 11 to 16 from Figure 3(b) arck C’OII\-enitbntl,v discussed toget,her. Although only shown for bands 12> 13> 14 and 16 on thus DNA degradation gels in Figures 5 and 6, all six start, with t)he sequence . T-G-G.A-C-(‘-T progressing in a 5’ to 3’ direction (from bottom upwards in the gels). Thi:: sequc~ncc overlaps the 3’ end of the priming DNA fragment, and corresponds to plus &ram1 DXA residues 480 to 474 (O-765 map unit,s). The succeeding sequence T-*4-;\C-G-(; . does not correspond to residue 473 onward, however, but matches t,hc plus strand sequence at only one downstream region, at 0.730 map units, xvhere it is cmbedded in t)he sequence G-A-A-C-C-T-T-A-A-C-G-G-8-G (residues 295 to 282, Fig. 2). Thus the RNAs from which these products are transcribed are not colinear transcript)s of viral DNA but consist of leader segments bound to the body of 19 S RNA. .Intercstinply, the sequence S-C-C-T is found at residues 293 to 290 as well as 477 to 474. creating an ambiguity in assigning the precise 5’ terminus of the body of 19 S RNX and 3’ termini of the leaders. As described below (also see Table l), we have found duplicated sequences at the fusion points of all SV40 RNA leaders and bodies and they all contain the dinucleotide G-G or A-G in the RNS sequence. For ease of presentation we have arbitrarily assumed that cleavages of RNB at these sites occm 2x

a22

Fro. 4. Autoradiogram 5’ 3ZP-labeled products 0.772 map unit AvaII cytoplasmic (C) RNA. bcon numbered similarly.

-I?. Ii.

of the obtained fragment Cytoplasmic

8%

GHOSH

ET

AL.

polyacrylamide/7 3%.wca gel clectrophoretic reverse transcriptaso catalyzed elongation of SV40 DNA annealed to late polyadenylated extended products in the C channel and by

separations of the of the 0.766 to nuclear (N) and in Fig. 3(b) haw

between the two purine nucleot,ides. Thus we designate residue 291 as the 3’ terminus of the leaders of the RNAs giving rise to bands 11 to 16 and residue 476 the 5’ terminus of the body of 19 S RNA.

T-(2i ‘4)-G-

C

T

GA

C

T G

*

-C -GA -A -A-(493)

-A -C -G -A

-c

-G

%A

2,

-cG --GA -G, -CA r.Ar

L

(473)

O-730 \(290--293) and I &-~%n

(247)

1

19 S(291-476)

19 S(444-476)

16 S(444-1381)

T(4837-4490)

Late

Lat,e

Late

Early

Early t(4657-4490) Immunoglobulin

19 S(212-476)

RNA nucleotides)

210

G G UIA

216

sequence

C C U ~~-

GIG

470

C U C! A/G

449.5 U U U UIA

1375 U U C U AIG

470

470 U G U C/A

U U U CIA

ilcceptor

and

on

the map 6’ termini

of the SV40 of acceptor

genome (see Fig. 2 and RNA segments. Hyphens

4560 4550 4495 C U A U AIA G/G U A A A U ------U U U UIA GCU[CAGGlUCAG----------UUUG[CAGGIAGCCA~GCUICAGGIAGCCA

-----

1s30 U A U U U -----

4840 A A C U G/A

A A C

445 GIU

440 G U U A A C UIG

-4 C ------U

295 G G U/U C ~------

RNA

440 445 G U U A A C UIG G U/A

a*5 C C G U U A/A

C C U C A G A/A

Donor

+A

A C U G/A

et al., 19786). been omitted

Vertical for clarit,y.

lines

GIA

G U U A A C U/G

G U U A A C UIG

C G U U AIA

G G UIC

splice

C U G

C A

bracket

sequences

U U C C A

GIG

G U/C

C A

C A

which

and imjmunoglobulin

G G UIC

across

C C U C A G AjA

Sequence

4485 U U C C A +CUAUAjAG]AUUCCA

4485 U U C C A

Reddy have

GIA

G1.4

G/C

480 C A -

480 C A -C

480 C A - +

sequence

1385 C U G -

G U/C

G G U/C

G G UjC

RNA

about the 3’ and 5’ term&i of donor and acceptor segments involved in early and late SV40 (Tonegawa et al., 1978) RNA splicing and the sequences of the jinal spliced products

Late

(fusion

sequewes

Whole numbers refer to reside numbers duplicated at the 3’ termini of donor

Nucleotide

TABLE

are

STRUCTURE

OF

SV40

LATE

S’“, -1

RNAS

Beyond residue 282, all extended products continue with the sequence G-C-(‘T-G . . G-A-A-A-T, corresponding to the plus strand sequence from residues 281 t#o 243. At residue 243, there is a dark band across all four channels of the electrophoretogram for band 16 with no further extension, indicating that this DNA product terminates at this point. However, there is further colinear extension of the ot,her productas with terminations between residues 233 to 237, 208 to 212, 194 t,o 196. 189 t,o 192 and at residue 182 for products 15 t,o 11, respectively. The completes c’xpanses of extended products 11 to 16 are summarized in Figure 2 and Tahlr 2.

TABLE 2 Structure Extended

product

of XV40

DNA

19s-2

extension

3’ tarminus residue (RNA

producta Splice

seque~~ce)

55(AUU) 1 lO(AUU) 182(AUU) 182(AUU) 215-220 243(AUU) 243(AUU) 305-315 85-95 1SZ(AUU) 189-192 194-196 208-212 233-237 243(AUU) 548(AUU) 243(AUU)

3

4 B 6 i 8 9 10 11 12 13 14 16 I6 18.5 s 16 8 (principal) Structure of the extension 16 S and 19 S RNAs.

late RNA

products

synthesized

(fusrd

nucleotides

444-476

None Nom! 4441476 ZjOIW ?u’OIW

444-476 NOW

212-476 291-476 291-476 ‘91-476 291-476 291-476 29-476 NOW

44P1381 on primers

bound

to the bodies

of SV40

late

Of note, two additional 19 S extension products terminate at residue 243 and two at residue 182 (see below) and 16 S RNA extension products also terminate at these loci (Ghosh, et al., 1978; Reddy, et al, 1979). Furthermore. the sequence in RNA from residues 243 to 245 and 182 to 184 is A-U-U, the principal sequence found adjacent to capped structures in SV40 late mRNA (Lavi & Shatkin. 1975 : Aloni, 1977 : Groner et al., 1977: Haegeman & Fiers, 1978a). The terminal sequences in RNA of bands 12 to 15 are neither A-U-U nor closely related sequences. Our conclusions concerning t,he possible relationship of extended products 11 to 16, as well as all other extended products from Figure 3(b), t,o in viva RNAs are presented in the Discussion. In contrast to the analyses of extended products 11 to 16, nucleotide sequence analysis of product 10 revealed the sequence T-G-G-A-C-C-T-T-C-T-G-A-G-G . at the bottom of sequencing gel in Figure 6, just beyond the 0.766 to 0.772 map unit primer. The first seven of these nucleotides correspond to nucleotides 480 to 474 of the SV40 genome (Fig. 2). The nucleotides T-C-T-G-A-G-G. . . however do not

826

P.

K.

GHOSH

E’T

AL.

(leg!(194- .I961 j(206,

I162

2 (26: G A A .T

(21010~7145(474) 0.7E i50-

WC

A -0*7 - 0.7

‘298 (290) ‘650( 474)

(480)

Fxa. 6. Autoradiogram of electrophoretic fraction&ions of the cleavage products of extended products 10, 12, 13 and 14 of the O-766 to 0.772 map unit AvdI primer from Fig. 3(b). Products were prepared and fractionated by the method of Maxam & Gilbert (1977). CT and AG above electrophoretic channels refer specifically to extended primers degraded by means which preferentially cleave at eytidylic and thymidylic acid residues and adenlyic and guanylic acid r&dues. DNA sequences are read and identified as described in the legend to Fig. 6.

STRUCTURE

OF

SV40

LATE

H”7

RNAS

come from the contiguous stretch of nucleotides but from positions 210 to 204 and onwards on the SV40 sequence. Thus band 10 is derived from an RNA wit’h a gap different from that in bands 11 to 16. As noted for bands 11 to 16, the sequence A-C-C-T is present at the two genomic siteswhich contributeto the band lOsplice,positions 477 to 474 and 214 to Zll.According to the conventionwe haveestablished,the 5’terminus of the bodyof the RNAgiving rise to band loremainsat residue476and the5’terminus of t,he leader lies at residue 212. Proceeding beyond residue 204, it is possible t,o read the DNA sequence of band 10 to approximately residue 135; no furt,her sequencr tliscontinuities are present in this region. It is not possible to read the sequenctt beyond this point, however. In order to determine the approximate 3’ terminus of t,his product, it was therefore necessary to estimate its chain length. We therefort> constructed a semilogarithmic plot (Fig. 7) relating the chain length of the various

-s: .‘j z s c f P s $ u

300-

2OO-

IOO80 60’

I 5

I I5

I IO Distance

migrated

I 20

I 25

I 30

km)

FIG. 7. Semilogarithmic plot relating the chain length of the 32P-labeled extended products of the 0.766 to 0.772 map unit AvaII fragment to their migration in the 8% polyacrylamide/7 Murea gel shown in Fig. 3(b). Extended products and markers of known chain length are indicat.ed by filled circles (0) ; products of unknown size by open circles ( f ,).

19 S extended products to distance migrated from the origin of the polyacrylamide gel in Figure 3(b). As size markers we used bands 11 and 16, 144 and 83 nucleotides in length as determined by the aforementioned analyses, band 5 from the Figure 3(a) co-electrophoresis with a length of 164 nucleotides by direct analysis (see below) and the H&II-D fragment, which was also co-electrophoresed and which is 372 nucleotides in length. Bands 7, 8 and 9 with lengths very close t’o, if not precisely, 267, 236 and 198 nucleotides, as discussed below, were also used as size standards. From this plot, we estimated the chain length of extended product 10 to be approximately 160 nucleotides. Based on this figure and the fact that the gap in this product spans 263 nucleotides, we have placed the 3’ terminus of band 10 between DNA residues 85 and 95. Of note. the sequence A-U-U is not present’ in RNA copied from this genomic region. The structures of extended products 7, 8 and 9 are conveniently discussed together. On degradation analysis (Fig. 8) the nucleotide sequence beyond the primer for bands 7 and 9 agrees perfectly with the sequence of nucleotides on the SV40 plus strand from DNA residue 478 to approximately residue 280 to 300, where the sequence

828

P.

K.

GHOSH

ET

AL. 7

CTGACTG

A

CTGA

(409)0*7526

-0.7595 (445)

(445)0.7595

0.7650 (474) 'O-7662 (460)

(47310.7646

FIG. 8. Autoradiogrctms of electrophoretic fractionations 8 and 9 extended products of the 0.766 to 0.772 map unit were prepared and fractionated as described by Maxem read and identified as described in the legend to Fig. 5.

of the cleavage products AwaII primer from Fig. & Gilbert (1977). DNA

of the band 7, 3(b). Products sequences are

pattern can no longer be read accurately. Thus these two products are not spliced in this region. Band 8, on the other hand, contains a gap in its sequence removing DNA residues 445 to 475 (gap not shown, however, on the degradation pattern of Fig. 8). This gap is different from the two gaps present in the extended products previously discussed. However, it is also present in extended products 2 and 5 and is demonstrated for product 5 in Figure 10. The band 8 sequence can also be read accurately only up to approximately residue 275. Since the DNA degradation patterns for these bands could not be read to their 3’ termini, it was necessary to estimate the genomic localizations of these termini by chain length analysis. However, because of the absence of size markers of known nucleotide sequence between the HaeIII-D marker (372 nucleotides) and the Figure

STRUCTURE

OF

8V40

LATE

RNAS

S”!)

372

229 198

FIG. 9. Autoradiogranx of the 8% polyacrylamide/7 M-ure& gel electrophoretic separations 01’ the band 2, 3, 0, 7, 8 and 9 32P-labeledextended products from Fig. 3(b) before (left) and aft,er (right) incubation wit,h HaeIII restriction endonuclease. Figures in margins indicate chain lengths rjf the marker HaeIII-D (372 nucleotides) and Hind-E (174 nucleotides) fragments which WC~P coolectrophoresed in adjacent gel channels and the chain lengths of the Hue111 products of the band 8 and 9 extended products (198 and 229 nucleotides), the derivat,ion of which is discuxsc~tl in the test.

3(a) band 5 marker (164 nucleotides) in Figure 7, we chose to determine the size of bands 7 to 9 by re-electrophoresing them on SyA polyacrylamide/7 M-w% gels using as size markers their radiolabeled Hue111 digestion products. Hue111 cleaves SV40 DNA at, only two loci in the genomic region under discussion, at O-722 and 0.728 map units (between DNA residues 250 and 251, and 280 and 281), but only digestion products extending from the latter site to the 5’ terminus of the 0.766 to 0.772 map unit AwaII primer at residue 509 carry the 5’-terminal label. Figure 9 shows that bands 7 and 8 are cleaved by HaeIII restriction endonuclease, whereas band 9 is not cleaved by this enzyme. Since band 7 contains a continuous stretch of nucleotides between t,he 5’ end of t,he dun11 fragment and t’he HaelII sibe (i.e. between residues

830

P.

K.

GHOSH

ET

5 CTG

AL. 4

A

CT

G:

0

(445)

C A T G C

G T T

,765.O t:474>

(443)C b-7591 A (47910

=7660 -

FIG. 10. Autoradiograms of electrophoretic fractionations of the cleavage products of the band 4 and 5 extended products of the 0.766 to 0.772 map unit AvaII primer from Fig. 3(b). Products were prepared and fractionated as described by Maxam 8: Gilbert (1977). DNA sequences are read and identified as described in the legends to Figs 6 and 6.

509 and 281), the band 7 Hue111 digestion product must be exactly 229 nucleotides in length. The HaeIII digestion product of band 8 must be shorter than the band 7 digestion product by 31 nucleotides, i.e. 198 nucleotides, since it contains the 31nucleotide gap which removes residues 445 to 415.

STRUCTURE

OF

Sv40

LATE

RNAS

s:j I

It is seen that band 9 corn&rates with the band 8 Had11 digestion product in Figure 9. Therefore band 9 must contain approximately 198 nucleotides. Since it originates at DNA residue 509 and is an ungapped species, we have placed its 3’ t)erminus between residues 305 and 315. Undigested band 8 migrates slightly more slowly than the marker product of 229 nucleotides in Figure 9. From comparison ot its electrophoretic mobility with that of the 198 and 229-nucleotide markers, ire estimate its size to be between 235 and 240 nucleotides. Given the 31-nucleotide ga,p in band 8, this analysis places the 3’ terminus of this band betweeu residues 239 aud 244. Since we have previously localized the 5’ terminus of the major 16 S mRS;Z leader and the 3’ terminus of band 16, the most abundant 19 S RNA extension product, at residue 243 and since extension of primers bound to XV40 DSA from residues 250 to 300 show only one stop in this region. at residue 243 (data not presented), it appears most likely that the 3’ terminus of band 8 also lies at residue 243. Similar analysis of the band 7 extended product, using 236 nucleot,ides as t*he length of baud 8, suggests that undigested band 7 has a chain length of approximately 260 to 265 nucleotides, and thus a 3’ terminus between residues 244 and 249. For the reasons stated above. we believe that the 3’ terminus of band 7 lies at DNA residues 243. In summary, combined sizing and nucleic acid analysis suggest that bands 9 and 7 are continuous transcripts of late RNAs extending from residues 305 to 315 and 243. respcct’ively, to the 5’ terminus of the priming DNA fragment and that, band 8 contains a leader-complementary segment extending from residues 243 to 444 linked to the DNA transcript of the body of 19 S RNA (see Fig. 2 and Table 2). As note(l previously, the sequence in RNA from residues 243 to 245 is A-U-U. This sequence is not present in RNA between posit’ions 305 and 315 but the &abed sequrnees A-C-17 and A-G-U are both present at this site. The chain lengths of extended products 2 to 6 were estimat’ed from their migration on 84a polyacrylamide/7 M-urea (Figs 3(b) and 7) to be 430 to 440. 390 to 400. 325 to 330, 305 to 315 and 290 nucleotides, respectively. Band 6 was not present in sufficient quantity for DNA sequence analysis. However, digestion with Hue111resulted in its conversion to a species migrating with a chain length of 229 nucleotides (Fig. 9). This finding indicates that band 6 is ungapped in the region from nucleotides 280 to 475. Assuming no gap to the 3’ side of nucleotide 280, we can deduce from its chain length that the 3’ terminus at band 6 lies in the region between nucleotides 215 anal 220. RNA transcribed from DNA in this region does not contain the sequence A-IT-I‘. but does contain the related sequence A-C-C-U. On DNA sequencing analysis, band 5 exhibited the sequence G-G-A-C-C-A-G-TT-A-A (Fig. 10) proceeding in a 3’ direction from the priming ,4vaLI fragment. The first five nucleotides in this sequence correspond to residues 479 to 475 on SV46 DNA; the sequence A-G-T-T-A-A . . ., however, does not come from residues 474 and onwards, but from residues 443 and onwards. Since the sequence A-C-C is present from residues 477 to 475 and 446 to 444, we have by our convention again set t,he 5’ terminus of the body of the RNA giving rise to this extension product at residue 476 and the 3’ terminus of the leader at residue 444 (Table 2 and Fig. 2). This gap in extended product 5 is the same as that noted previously in product 8. Reading the DNA sequencing pattern of band 5 further, there is no evidence of any additional gap to approximately nucleotide 260, at which point the electrophoretogram loses clarny. The presence of the 31-nucleotide gap from residues 445 to 475. coupled wit,h a11

832

P. K. GHOSH

ET

AL.

estimated chain length of 305 to 315 nucleotides, suggests that band 5 has a 3’ terminus in the region of nucleotide 170 to 180. Although A-U-U residues are not present in RNA copied from the 160 to 180 nucleotide region, they are present in RNA transcribed from residues 182 to 184, and extension of primers bound to SV40 DNA in the region from residues 190 to 240 show a very strong stop at residue 180 (data not presented). We therefore believe that band 5 terminates with the complementary T-A-A sequence at residue 182. The DNA sequence pattern for extended product 4 (Fig. 10) reveals a continuous transcript of DNA from residue 480 to approximately residue 275, where the pattern becomes difficult to read. Given a chain length of approximately 325 to 330 nucleotides, the 3’ terminus of this extended product falls at approximately residue 179 to 184. Again, considering proximity to the A-U-U sequence on RNA copied from residues 182 to 184, we strongly suspect that band 4 has a 3’ terminus at residue 182. Bands 2 and 3 were not present in quantities sufficient for DNA sequence analysis. However, upon HaeIII digestion, bhey were converted to species migrating with chain lengths of approximately 198 and 229 nucleot,ides, respectively (Fig. 9). Thus band 3 contains no gap in the region from nucleotides 280 to 475, while band 2 appears to contain the characteristic Jl-nucleotide gap from residues 445 and 475. Coupling these data with the chain lengths of these extended products, and assuming no gap to the 3’ side of residue 280, we have estimated that the 3’ termini of bands 2 and 3 lie, respectively, between residues 40 and 50, and 110 and 120. Nucleotide sequence analysis of the unfractionated products of several primers in the late leader region have demonstrated strong terminations at residues 55 and 110 (data not presented), both sites of the familiar A-U-U sequence in RNA (Fig. 2). We therefore suspect that the 3’ termini of extended products 2 and 3 lie at residues 55 and 110. Our conclusions concerning the genomic origins of producbs 2 t,o 6 are summarized in Table 2 and Figure 2. Extended product 1 did not move significantly from the origin of the 8% polyacrylamide gel of Figure 3(b). Upon DNA sequence analysis, it gave a complex pattern of a mixture of DNAs and upon re-electrophoresis on gels of lesser polyacrylamide concentration it gave a number of discrete bands, all greater than 450 nucleotides in length and all containing too little radioactivity to permit further analysis. In an attempt to define any additional reverse transcriptase stops to the 3’ side of DNA residue 55 (0.685 map units), we chose to perform a primer extension experiment using as primer the EcoRII-N fragment which spans the genomic plus strand from residues 100 to 154 (O-693 to O-704 map units) (Fig. 2). The distribution of extended products on ST/, polyacrylamide/7 M-Urea gel electrophoresis is shown in Figure 11(a). By comparison with co-electrophoresed size markers, it was possible to estimate the length of these extended products to be from 65 to greater than 340 nucleotides. Band 4 was one of the few bands present in sufficient quantity for DNA sequence analysis. The degradative pattern, shown in Figure 1 l(b), reveals a sequence extending continuously from residue 100 to residue 5189 on the SV40 sequence. Thus late RNA molecules extend up to residue 5189 or beyond, and at least some are unspliced in the region from residues 100 to 5189. It is not possible to tell if RNAs which extend up to or beyond residue 5189 are spliced or unspliced between residues 475 and 100 and, if the former, which of the demonstrated splices they contain. Of note, the RNA sequence at residues 5189 to 5191 is A-U-U. and so there is reason to suspect that one or more in vivo RNA species exist with 5’ termini at, this locus.

STRUCTURE

OF

SV40

LATE CT

RNAS

X33

AG

(a)

(b) PIG. 11. (a) Autoradiogram of the 8% polyacrylamidej7 x-urea gel electrophoretic separation of the 5’ 32P-labeled products obtained by reverse transcriptare catalyzed elongation of thr EcoRII-N fragment of SV40 DNA (O-693 to 0.704 map units, residues 100 to 154 in Fig. 2) annealed to late polyadenylated cytoplasmic RNA. Denatured SV40 DNA fragments of 109, 148 and 276 nucleotides were co-electrophoresed on the same gel as size markers. The first fragment was obtained b> Hinf restriction enzyme digestion and spanned 0.612 to 0.533 map units (DNA residues 4379 to 4488); the second was obtained by combined Ah and Has111 enzyme digestions and extended from 0.166 to 0.194 map units (residues 2670 to 2718); the largest fragment was generated blcombined digestion with HaeIII and HindII, Hind111 enzymes and spanned O-271 to 0.324 map units (DNA residues 3121 to 3396). (b) Autoradiogram of the electrophoretic fractionation of the cleavage products of extended product 4 from Fig. 1 l(a). Products were prepared and fractionated by the method of Maxam KGilbert (1977). .DNA sequences are read and identified as described in the legends to Figs 5 and 6.

834

P. K. GHOSH

ET

AL.

The structure of bands both larger and smaller than band 4 are currently under investigation. It is likely that many of the latter are derived from RNAswith termini between residues 55 and 5189. These analyses have been complicated by the fact that the EcoRII-N fragment binds to SV40 DNA between residues 100 and 154 and between the tandemly repeated residues 45 and 99. Preliminary analysis of band 6 suggests that it originates at residue 99 and extends to residue 5189. Bands 1,2 and 3 are barely detectable and have not been present in quantities adequate for analyses. (b) Extension of the O-802to 0.810 map unit primer In an attempt to identify a viral RNA containing the VP3 but not the VP2 initiation sites, we bound a fragment of viral DNA spanning the O-803 to 0.810 map unit region (DNA residues 676 to 711) to late cytoplasmic polyadenylated RNA and extended it in a 3’ direction as previously described. The polyacrylamide gel electrophoretic pattern of the extended products is shown in Figure 3(a). Five principal extended products are seen, one migrating slightly slower than extension product 10 and another four migrating between extended products 2 to 7 of the O-766 to 0.772 map unit AvaII fragment (Fig. 3(b)). AH are very sharp, suggesting that they are discrete products. When band 5 was subjected to DNA sequence analysis, a clear electrophoretogram (Fig. 12) was obtained. The sequence at the bottom of the electrophoretogram, T-A-G-G-C-C-T-G-T, corresponds to nucleotides 679 to 671 at and to the 3’ side of the priming fragment. The subsequent sequence can be read through the full electrophoretogram and agrees perfectly with that of SV40 DNA up to a unique termination at DNA residue 548 (0.779 map units) Thus the band 5 extended product is colinear with SV40 DNA, extends from residues 548 to 711 and is 164 nucleotides in length. Furthermore, the DNA sequence from residue 548 to 550 is T-A-A and thus the RNA giving rise to this extended product contains the familiar A-U-U sequence at this site. When a fragment extending from residues 792 to 886 containing the VP3 initiation codon was used for a similar reverse transcription, a product colinear with SV40 with a 3’ terminus at residue 548 was also obtained. If the reverse transcriptase stop at residue 548 corresponds to the 5’ terminus of a discrete in vivo RNA, such an RNA would not contain the VP2 initiation codon at O-765 map units and could potentially serve as a template for VP3 synthesis. For convenience, we refer subsequently to this putative RNA species as 18.5 S RNA. From their migrations with respect to the extended products of the O-766 to O-772 map unit BvaII fragment, bands 1 to 4 of Figure 3(a) have chain lengths varying from about 290 to 425 nucleotides. Since these extended products have 5’ termini at residue 711, they must extend to approximately residue 420 and beyond. Preliminary DNA sequencing analyses have revealed no evidence of sequence discontinuities in the region from residues 675 to 500 but the patterns have not been clear enough to read accurately beyond this point. In any event the RNAs giving rise to these bands all contain the VP2 as well as the VP3 initiation site, and they very likely correspond to various species of 19 S RNA.

4. Discussion The primer extension methodology we have used in these studies is a powerful and efficient tool for determining the nucleotide sequences of RNAs. It is especially useful for investigating RNAs which are present in cells in extremely small quantities,

STRUCTURE

OF

SV40 C

(6401

LATE T

G

RNAS

s3.5

A

‘960z T C A c C

C G (67! 310 .0042

FIG. 12. Autoradiogram of the electrophoretic fractionation of the cleavage products of the band 5 extended product of the 0.803 to 0.810 map unit HaeIlI~-AZu primer from Fig. 3(a). Products were prepared and fractionated according to Maxam & Gilbert, (19’77). DEA sequences are read and ident,ified as described in the legends to Figs 6 and 6. Residue 668 was an A in the virus ntock used for DNA sequence analyk but is a G in the isolate uwd for the present nt,udies.

for investigating the individual components of complex mixt’ures of RNAs, and when the sequence of template DNA is known, for precisely establishing the sites on RKAs which have undergone splicing. Although it may be difficult, to distinguish reverse transcriptase st’ops at the 5’ t)ermini of t,emplabth RNAs from premature stops at

836

P.

K.

GHOSH

ET

AL.

other sites, this method may also be useful for identifying the 5’ termini of RNAs in systems in which corroborative evidence is available. Although we have observed premature reverse transcriptase stops in transcriptions carriedout onearlySV40 mRNA templates, these have occurred relatively infrequently, i.e. at the level of 2 to 3%, and most of the information we have accumulated bearing on the significance of reverse transcriptase stops suggests that in our transcriptional system the principal and likely some of t.he minor reverse transcriptase stops are specific and occur at the 5’ termini of in vivo RNAs. The following four points offer the strongest support. (1) A number of sites at which reverse transcription stops on one form of late RNA are read-through in extensions on other forms of RNA. Specifically, stops at residues 400, 383, 366, 352 and 331 in extensions of primers bound to the body of 16 S RNA (Ready et al., 1979) are completely readthrough in extensions of primers bound to the 5’ end of the bodyof 19 S;stops at three sites from residues 190 to 205 in extensions on 19 S templates are read-through in extensions on 16 S templat’es; and a stop between residues 305 and 315 in extensions of cytoplasmic 19 SRNAis read-through inextensionscarriedout withan identicalprimer on nuclear RNA. These data suggest that reverse transcriptase terminations at these sites are specific. (2) The 3’ terminus of the principal extended product of 16 S RNA at residue 243 (0.721 map units) (Ghosh et al., 1978) is in complete agreement with the 5’ terminus of the principal 16 S RNA determined by direct analysis of RNA labeled in vivo (Dhar et al., 1977; Celma et al., 1977a: Haegeman & Fiers, 1978b: P. Ghosh, P. Lebowitz & S. Weissman, unpublished observations). The fact that the reverse transcriptase stop at residue 243 in the extension on the major 16 S is specific and corresponds to an in vivo 5’ terminus strongly suggest,s that it is also specific in extensions on the 19 S RNAs and identifies the 5’ termini of discrete RNA species giving rise to extended products 7, 9 and 16. Fingerprints of formamide gel-purified 19 S RNA labeled in vivo lack the specific T, RNase oligonucleotide from residues 241 to 249, but contain oligonucleotides from positions beyond residue 249 (Thimmapaya et aZ., 1978), alsoindicatingthepresenceof a major 5’ terminusfor 19 S RNA at approximately 0.721 map unit. (3) Products derived from residues 5090 to 154 are scarce, while products derived from residues 154 to 200 are present in greater but variable abundance in RNase digests of lat’e polyedenylated cytoplasmic RNA labeled in vivo (Dhar et al., 19773). These data suggest a second 5’ t’erminus for late RNAs between residues 154 and 200 and are consistent wit,h this terminus lying at residue 182, the 3’ terminus of 19 S extension products 4, 5 and 11. (4) The principal 5’-terminal capped sequence of SV40 late mRNAs is known to be 7mGpppmApUp(Up) (Lavi & Shatkin, 1975; Aloni, 1977; Groner et ul., 1977; Haegeman & Fiers, 1978a) and the major 16 S extended product (Ghosh et al., 1978), four of eight additional 16 S extended products (Ready et al., 1979), eight of the 16 19 S extended products we have examined and the extended product with a 3’ terminus at residue 548 all terminate with the complementary sequences T-A-A or T-A. Moreover, non-specific termination at all A-U-U sequences on RNA may be ruled out, since extensions of primers bound to the body of 19 S fail to stop at two A-U-U sequences at 0.745 and 0.765 map units, upstream from the 3’ termini we have demonstrated for the extension products of 19 S RNA. On the basis of this evidence, we believe that the 19 S extended products we have obtained which terminate at residues 243 and 182 with the sequence T-A-A are derived from discrete in vivo RNAs, probably capped, with 5’ termini at these

STRUCTURE

OF

SV40

LATE

RNAS

XXi

positions. We, in addition, think it is very probable t’hat extended products with 3’ t’ermini at other sites bearing the T-A-A sequence, i.e. residues 5189, 55, 110 and 548. are also transcribed from in viva RNAs terminating at these loci. However, additional experimental evidence will be required to prove this contention and to determine if any non T-A(A) reverse transcriptase stops mark the 5’ termini of additional species of 19 S RNA. In this regard, we are presently analysing capped oligonucleotides derived from T, RNase digestion of 32P-labeled late SV40 RNA to determine it Hequences other than A-U(U) lie adjacent to terminal caps. The structures of the DNA products we have synthesized by extending a primer bound to the 5’ terminus of the body of 19 S RNA are presented in Table 2 and Figures 2 and 13. These products may be grouped into four distinct classes on the basis of whether they are colinear with SV40 DNA or gapped and the span of their gaps. Since we have no reason to doubt) the fidelity of chain elongation by reverst transcriptase in our experiments, these four classes of extension products indicate the presence of four families of 19 S RNAs. The first class of extension products is characterized by the presence of a gap of 184 nucleotides which extends inclusively from residue 475. adjacent to the 5’ terminus of the hotly of 19 S RNA to residues 292. A total of six ext,eusion products, 11 to 16, fall into this class and they diffrr from one anobher only in t,heir extension in a 3’ direction. Topet,her they represent Map

units

r

X6; -A-

L

U” -----

Al UU ----

:

19 s RNAs

DNA

residue

number

FIQ. 13. Genomic expanses of the principal products synthesized on primers bound to the 5’ termini of the bodies of 16 S and 19 S late RNA. Broken lines indicate the extension in a 3’ direc. tion of additional extended products which have not yet been fully characterized. Dotted lines indicate segments of RNA removed by splicing. The map unit and DNA residue co-ordinates are given for the 3’ termini of these products and points at which splicing occurs. The specific product with a 3’ terminus at residue 648 is identified as an extension product of a presumptive 18.5 8 RNA to distinguish it from the other 19 S RNA extension products. Extension products synthesized on the 0.765 to 0.772 map unit primer bound to the body of 19 R RNA have betxn numbered in accord with the numbering of gel bands in Fig. 3(b).

838

P. K. GHOSH

ET

AL.

the most abundant group of products and 16 is the most abundant of all the extension products synthesized. Product’s 11 and 16 terminate with the sequence T-A-A at residues 182 and 243 and, for reasons stated above, we believe they represent transcripts of in vivo RNAs. Furthermore, these two products are also abundant in polysomalRNA (unpublished observations) and thus they probably are the transcripts of bona fide mRNAs. It is not clear whether extended products 12 to 15 a,re also copies of in vivo RNAs. The fact that they terminate with sequences other than T-A-A does not rule out their derivation from discrete in vivo RNAs with 5’-terminal sequences other than A-U-U, since we have recently found that the most abundant species of early RNA do not terminate with this sequence, the late mRNAs of polyoma virus terminate with a number of different 5’ capped structures (A. Flavel, S. Legon, A. Cowie & R. Kamen, unpublished work) and preliminary experiments by our group have shown a number of capped T, RNaseoligonucleotidesat the 5’termini of 32P-labeled late SV40 RNA. The second class of extension products is comprised of three members, products 2, 5 and 8. The characteristic feature of this class of products is the presence of a short gap removing residues 445 to 475. All three products terminate with the sequence T-A-A, at residues 55, 182 and 243, respectively. We believe that the latter two products are derived from discrete species of in vivo RNA and we suspect that the same holds true for product 2 as well. The RNA giving rise to product 8 is present on polysomal extracts and imerestingly contains the same leader as the principa,l species of 16 S RNA (Ghosh et aZ., 1978; Table 2, Figs 2 and 13). Extended product 10 is the sole member of the third class of 19 S RNA extension products. The distinguishing feature of this product is the presence of a 263-nucleotide gap removing residues 213 to 475. Tt is not clear whether the RNA giving rise to this product has its 5’ terminus between residues 85 and 95 or at, a site closer to the origin of DNA replication. However, since t.his product is present in t,ranscripts of polysomal RNA, the RNA from which it is copied is probably an mRNA. The final class of extension products is composed of five members (3, 4, 6, 7 and 9) which are colinear copies of SV40 DNA from t’he 5’ terminus of the body of 19 S RNA to their 3’ termini. Products 3,4 and 7 terminate with the T-A-A sequence at residues 110, 182 and 243, respectively. We believe that the latter two and probably the first as well are transcripts of in vivo species of RNA w&h 5’ termini at these loci. On the other hand, we do not know at this time whether the 3’ termini of products 6 and 9, at residues 215 to 220 and 305 to 315, respectively, also correspond to the 5’ termini of specific in vivo RNA species. Of note, products 7 and 9 are present in moderate abundance among the 19 S RNA extension producm and are seen in transcripts of polysomal RNA. Thus, the RNA(s) from which they are transcribed, although ungapped, probably serve as mRNAs. On the other hand, products 3, 4 and 6 are relatively scarce. In view of the fact t’hat the reverse transcription of the 0.765 to 0.772 map unit AvaII primer bound to nuclear RNA yields principally extended product,s in the size range of products 4 to 7, consideration must be given to the possibility that the RNA(s) from which these products are copied is primarily nuclear in origin. Our identification of an extended product with a 3’ terminus residue 5189 places the 5’ terminus of at least one species of late RNA within the origin of DNA replication or beyond. From present evidence, we do not know if this RNA is a colinear transcript of SV40 DNA or a gapped species.

STRUCTURE

OF SV40 LATETRPr'AS

83!3

The biological significance of multiple species of 19 S RNA is not known. It appears likely that certain species, primarily gapped but possibly also ungapped, serve as precursors for ot,her species. The RNA with its 5’ terminus at residue 5189 or beyond is an especially good candidate for a precursor of other late RNA species. Since the gapped RNAs fall mainly into two classes, it is tempting to consider that one class serves as t’emplat’es for VP2 and the second for VP3 synthesis. In this regard, it is noteworth) that species with the gap removing residues 445 to 475 contain sequences which can base-pair with the VP2 initiation codon on the body of 19 S RNA (Fig. 14), whilt,

G 470 481 U-U-u-U-A-u-U-U-C-i ‘G-U-C-C-~-G-G . . . . . . . . . . . . . . . . . A-A-A-A-q P-A-A-G-U-C-A-G-G-U-A-C-U-C 310 G

300

FIG. 14. A model showing base-pairing between sequences at and adjacent to the 5’ terminus of the body of 19 S RNA (transcribed from residues 465 to 484) to a proximate segment of RNA which is ret,ained in 19 S species lacking any gap or containing the small gap between residues 444 and 476. It is proposed that this base-pairing may make the AUG VP2 translation initiator at residues 480 to 482 inaccessible to ribosomes.

species with the gap removing residues 291 to 476 lack these sequences. It thus seems conceivable that RNAs with the shorter gap contain an inaccessible VP2 initiation codon and serve as templates for VP3 synthesis, while species with the longer gap contain an accessible VP2 initiation codon and code for this protein. In addition to providing evidence for the existence of multiple species of 19 S RNA. our extension experiments also suggest the existenceof apreviouslyunrecognized RNA with a 5’ terminus at residue 548 (Table 2, Figs 2 and 13). This putative RNB, designated for convenience 18.5 S RNA, is interesting in three structural respects. First, its 5’-terminal sequences are ungapped. Secondly, it also has a 5’ terminus of A-U-C. Thirdly, it lacks the VP2 initiation codon at residues 480 to 482, which is present on all the 19 S RNA species. Thus, it has qualifications to serve as a template for VP3 synthesis and may be comparable to the 18 S RNA species isolated from polyoma virus late mRNA which codes in vitro for polyoma VP3 synthesis (W. Gibson, personal communication; A. Smith, personal communication). However. if it exists in viwo, it is present in such small quantity that is unlikely t,o be the sole mRNA directing the synthesis of VP3. The presence of multiple late RNA species with 5’ A-U-U termini transcribed from a number of different genomic sites raises the question of whether all these 5’ termini arise by post-transcriptional processing of a single primary transcript or whether at least some may be primarytranscription initiation sites.The facts that 5’terminal capped structures may mark sites at which transcription is initiated (Wei & Moss, 1977) or at which post-transcriptional cleavages take place (Schibler & Perry, 1976) and that the A-U-U sequence is capped (Lavi & Shatkin, 1975; Aloni. 1976: Groner et al., 1977; Haegeman & Fiers, 1978a) are consistent with these termini arising by either mechanism. In addition, the presence of the A-U-U sequence at t)he 5’ termini of a number of late RNA species is consistent with its being a part of a signal specifying an adjacent post-transcriptional processing step or that it,s complement on DNA, T-A-A, may play a role in signaling transcription initiation. However, the presence of T-A-A sequences on the plus strand of viral DNA at two

840

P.

K.

GHOSH

ET

AL.

locations within the 0.67 to 0.77 map unit region (residues 366 to 368 and 469 to 471) at which 5’ termini of late RNAs have not mapped suggest that by itself this sequence on DNA or the A-U-U sequence on RNA are insufficient to signal transcription initiation or a post-transcriptional processing step, respectively. (Alternatively, but less likely, RNAswith 5’termini derived fromthese two T-A-Asites may not be stable). Comparison of sequences on the plus strand of DNA beyond the five T-A-A sites marking the 3’ termini of the late SV40 extension products does not reveal significant homologies. Examination of these sequences also does not reveal any similarities to the nucleotide sequences of known bacterial promot’ers. Although these negative findings argue against initiation of transcription from these sites, they do not entirely rule out these sites as promoters for initation of transcription. Moreover, one can envision novel mechanisms for transcription initiation at multiple sites in this genomic region based on the recent work of McMacken et al. (1977) and Schertzinger et al. (1977) on initiation of DNA synthesis in phages $X174 and T7. In these systems DNA replication depends upon two enzymes, a primase which catalyzes the synthesis of short ribonucleotide primers (in the T7 case predominantly with the sequence of pppApCpCp) and DNA polymerase which is involved in both initiation and extension of these primers. In particular? mechanisms have been proposed whereby a mobile DNA binding protein serves as a “promot,er” for primer synthesis not only at the origin of replication but also at other sites, allowing for synthesis of short, discontinuous Okazaki DNA fragments. It is conceivable that a mechanism exists for the synthesis of A-U-U primers on viral plus strand T-A-A sequences by a cellular primase with subsequent extension by a cellular RNA rather than DNA polymerase. Alternatively, A-U-U primers could be extended by both enzymes, providing a mechanism for coupling the initiation of DNA synthesis and late RNA transcription. Finally, A-U-U primers could be synthesized independent of a template, but subsequently bind to certain T-A-A sites on the viral plus strand and undergo ribonucleotide extension. The recent findings of two transcription initiation sites (albeit RNA polymerase III sites) for adenovirus VA RNA both in wivo and in vitro (Celma et al., 19773; Vennstrom et al., 1978), provide some precedent for the presence of multiple proximatetranscriptioninitiationsiteswithintheleaderregionforthelateRNAsofSV40. The mechanism by which gaps in nucleotide sequence arise in “mature” RRAs is presently the subject of intensive investigation. Some form of discontinuous transcription seems very unlikely since continuous transcripts as well as transcripts lacking internal RNA segments are present among early (V. B. Reddy, P. K. Ghosh, P. Lebowitz & S. M. Weissman, manuscript. in preparation) and late SV40 RNAs, adenovirus RNAs (Bachenheimer & Darnell, 1975; Berget et al., 1977; Chou et al., 1977; Klessis, 1977: Weber et al., 1977) and globin mRNAs (Tilghman et al., 1978; J. Mertz, personal communication). Independent synthesis of the bodies of messages and leader segments with subsequent fusion also seems unlikely as unfused leaders and RNA bodies have not been detected in cellular RNAs ; in addition, our data showing different leaders on SV40 late 16 S and 19 S RNAs are incompatible with a mechanism involving independent synthesis and non-specific fusion of leaders and RNA bodies. The final possibility is that gapped RNAs arise from continuous transcripts by post-transcriptional excision of an internal segment of RNA followed by ligation of the RNA chains. Such a mechanism is supported by the recent finding that certain yeast precursors can be converted to mature tRNAs ilz vitro by a cellular enzyme which excises a 14 to 17-nucleotide long internal segment of RNA located

STRUCTURE

OF SV40

LATE

RNAS

841

within the anti-codon loop region (Knapp et al., 1978) and by evidence from U.Y. mapping and other data indicating that there is a common promoter for most late mRNAs in cells infected with adenoviruses. (Evans et al., 1977; Goldberg et aZ., 1977). Unless the mechanisms for. formation of spliced RNAs are diverse, these dat,a suggest that the formation of the mature polyadenylated RNA species in SV40infect,ed cells involves post-transcriptional processing of precursor RNA molecules. including cutting and religation. If splicing occurs by post-transcriptional excision of RNA segments and ligation of the newly created 5’ and 3’ ends of RNA chains, then signals must exist on thr unspliced RNA precursors (or possibly on nucleoprotein complex precursors) which specify points at which splicing is to occur. The fact that SV40 mutants containing tielet,ions of DNA coding for internal sequences of the principal 16 S mRNA leader still direct synthesis of a gapped 16 S mRNA and are viable in t)he absence of a helper virus (R. Dhar, personal communication) suggest that sequences near the 3’ end of leaders and not internal leader sequences take part in the signals which specify sites at which splicing occurs. For the SV40 late RNAs we have identified foul different splices: one in 16 S RNA fusing nucleotides 444 Do 1381, and three in 19 S RKA fusing nucleotides 212 to 476,291 to 476 and 444 to 476. For S4VO early mRNAs, we have identified two specific splices, involving a common 5’ acceptor (body) sequence but two different 3’ donor (leader) sequences. Table 1 shows the specific nucleotide sequences about the 3’ and 5’ termini of donor and acceptor segments for these six RNAs and the nucleotide sequences about the final splices. Comparison of these and adjacent sequences reveal similarities which may cont)ribute to processing signals. First, for each of the six splices we have defined, as well as a spliccl suggest’ed for an immunoglobin mRNA (Tonegawa et al., 1978). short runs of two to f’our ident’ical nucleotides lie at the 3’ ends of the donor and 5’ ends of the acceptor RNA segments. As noted previously, these sequence duplications create ambiguities in assigning t)he precise termini of donor and acceptor stretches of RNA. The ambiguous sequences in the seven combinations of donors and acceptors are C-A-G-C: A-G. G-G, G-G-U and A-G-G-U. Thus, the minimum overlap between donors and acceptors so far found seems to be the dinucleotides A-G or G-G. The presence ot ident’ical sequences containing either of these dipurines at the 5’ and 3’ ends of acceptors and donors suggests that a single enzyme catalyzes t#he two cleavages required for excision of an intervening segment of RNA. Secondly, preceding t,ht: 5’ rnd of each of the acceptor segments of RNA are stretches of nucleotides which arc’ urid.ylic acid-rich and which include three or four pyrimidines immediately preceding the duplicated bases. In addition the sequences C-C-A or C-C-U are present bryond the 3’ end of bhe duplicated nucleotides of acceptors (Table 1). In addition to these features of the primary sequences, possibilities exist for extensive base-pairings within both donors and acceptors and t,o some extent between donors and acceptors. Furthermore, sequences within excised RNA segments c:a,lr either base-pair with themselves or with sequences within the donors and/or acceptors resulting in the approximation of sit,es at which cleavages occur. In Figures 15 and 16 we present one pair of hypothetical models of 16 S and 19 S RNA precursors witll extensive base-pairings which demonstrate approximation of segments of RNA involved in the fusions of residues 444 to 1381 and 291 to 476, respectively. Comparable tigures have also been constructed for t)he other two splices we have identified in the lat,t: SV40 RN&. WitJh sites destined for cleavage h&l in close approximatiotl

842

K. GHOSH

P.

4y ‘\\

Ap.A-A-A-G,u~c-~~~u-u-G~~ 4306

AL.

-----;.P;,-

1390 f'$ G $"l 'A PC*":

. . . . . . . .

ET

y-G-Q

I' ,I'

\\

/

\

//

\

1380 h440 G-A . . . . . . . . . . . -...y....

~$@w-&G-A-A-A-A-A-C-~

Cl340

‘-.

A Li ‘u A-A-A’--

L u i u-wJ\A-4J;Ju-U

A’

C. A-G-A

\

\ :

‘._

_x’

FIQ. 16. A base-pairing model of a portion of the 6’-terminal region of ungappedlate RNA which undergoes splicing to yield 16 S RNA with fusion of residues 444 to 1381. The model has been constructed to provide extensive base-pairing in this region and approximation of the sequences which undergo cleavage and splicing. Considerations of RNA thermostability as discussed by Gralla $ Crothers (1973a,b) and Uhlenbeok et al. (1973) were taken into account in constructing this model. G.C and A*U base-pairs are marked by heavy dots and G.U pairs by light dots. Duplicated sequences at the 3’ and 6’ termini of donor and acceptor RNA segments are marked .u-u, with arrows. ‘! 370 ‘! A ‘a.!,

F

PG.

360

g Ij.6 i.A b.h ,&Ii

380

‘h.*.

4

c.

‘!

35;y;;~-G,u

g=*i , *

&o 260 \i""$ c g Y'F ;I% ,liir"2*o 'C * 240 86'UC T.Gu."~~~G-CA.".G.G.A.c~~~A. ----A-i . . 55-G."uaii"ili . . . . . _......., <---UCyUAA” G

A-G 500 Ey G

%&k!& 4

;-;i~ 49o’U’

G

,G+UOC-C 480

320 c

J “-G-G

c 330 A-A-GUA-A-A-A-A-A-n’ i-G-c-u’i\-A-C-G-C-C’ . . . . . . . . . . . . . . . . . . ,,,“u~~~~~u~“~~“~-G~~u~~,A-A/U-G-G-u-C-~~~U-G%_B,A,J

‘yq' ,".G "".X (I.4 U'4 pq 400 U'9 340 pq ".A id d.i ** i.i 6 A &pi

i 420 G’ A-c‘c G’G-“-“-*-*.c-u’ . . . . . . . . . . .

A-*-A

‘A h

~G.".u.G~ FIG. 16. A base-pairing model of a portion of the 5’ terminal region of ungapped 19 S RNA which undergoes splicing to yield RNAs with fusion of residues 291 to 476. The model has been constructed to provide for maximal base-pairing in this region and approximation of the sequences which undergo cleavage and splicing. Considerations of RNA thermostability as discussed by Gralla & CL-others (1973a,b) and Uhlenbeck et al. (1973) were taken into account in constructing this model. G.C and A-U base-pairs are marked by heavy dots and G.U pairs by light dots. Duplicated sequences at the 3’ and 6’ termini of donor and acceptor RNA segments are marked with arrows.

by secondary structure in these models, one can envisage mechanisms by which either one or two enzyme molecules catalyze cleavages at two relatively nearby sites with sequence similarity followed by fusion of proximate 3’ donor and 5’ acceptor termini. We are especially interested in models involving approximation of donor and acceptor cleavage sites since the aforementioned finding of different 16 S and 19 S RNA leaders suggests that cleavages and fusions do not occur in a random manner but are co-ordinated so that formation of leaders and acceptor segments of RNA is

430

STRUCTURE

OF SV40

LATE

RNAS

843

tightly linked to their subsequent fusion. Models for splicing involving secondary structure of precursor RNA molecules are attractive from one additional point of view; specifically, it is possible that conformational rearrangements which may accompany the splicing reaction result in a decrease in free energy which may provide all or a portion of the energy requirements for this reaction. Experimernal evidence is needed to test for specific secondary structures in these RNAs. In this discussion, we have implied that primary genomic transcripts are the substrates for each one of the splicing events we have defined; alternatively, however, it, is possible that RNAs containing small gaps, e.g. the gap which removes residues 445 to 475, mayundergo second or even third rounds of splicing to yield mature RNAs with larger gaps. However, SV40 early mutants withdeletions in genomic regionswhose transcriptsaresplicedout~ofmatureRNAssynthesizestructurallynormallargeTantigen and are viable in the absence of helper viruses (Shenk et al., 1976; Sleigh et al., 1978; ‘I’. Shenk. personal communication) suggesting that there is no absolute requirement for sequential processing steps in the creation of large gaps. In either case, the relative abundance of different RNA species could be a consequence of either the relative efficiency with which various splicings occur or the stability of the gapped RNAs. It has been suggested previously that 16 S late mRNA may be derived from a 19 S RNA precursor by a cytoplasmic post-transcriptional processing event. This contention has been based on the findings that the portion of the viral genome coding for the 3’ terminus of the body of 19 S RNA also codes for the body of the 16 S species (Weinberg & Newbold, 1974; Khoury et al., 1976; May et al., 1977), that 19 S RNA becomes labeled more rapidly and turns over more rapidly than 16 S mRNA in infected whole cells (Weinberg et al., 1974) and that SV40-infected cells which have undergone enucleation show a decline in prelabeled 19 S RNA with a concomitant increase in labeled 16 S mRNA (Aloni et aZ., 1975). In view of the comparative structures of the principal 16 S mRNA and the multiple species of 19 S RNA, it is clear that the 16 S species cannot be a processed derivative of the major 19 S RNA (containing the leader from nucleotides 243 to 291 attached to the body of 19 S RNA). However, it could conceivably arise by processing of any of the ungapped 19 S RNAs or several of the less abundant gapped 19 S species. We have recently observed that in addition to the principal species of 16 S mRN3 a number of minor species of this RNA are synthesized in permissive cells late in lytic infection. Most of these species contain longer leaders than the major species and a gap within these leaders removing nucleotides 212 to 351 (Reddy et al., 1979). In contrast. leaders containing this internal gap have not been found among t’he 19 S RNAs and all or most of the sequence from nucleotides 212 to 351 is present in those 19 S RNAs with the longest leaders. This finding raises the interesting possibility t#hat t,he presence of this sequence of nucleotides may inhibit a subsequent processing st,ep leading to the creat,ion of the large primary gap in 16 S mRNA and may favor alternat’ive processing steps leading to formation of various forms of 19 S RNA. The AUG initiaCon codon for VP1 synthesis lies 245 nucleotides from the 5’ terminus of the principal 16 S mRNA, the presumptive initiation codon for VP2 synthesis lies from 52 to 395 nucleotides from the 5’ termini of the various 19 S RNA species and the presumptive initiator for VP3 s-ynthesis lies 296 nucleot’ides from the 5’ terminus of the putative 18.5 S RNA. The mechanism(s) by which ribosomes bind t’o mRNAs is not known; however, it has been shown that Fir-terminal capped structures play an important role in ribosomal binding to mRNBs and that

344

P. K. GHOSH

ET

AL.

bound ribosomes protect fragments of 40 to 60 nucleotides about initiation AUG triplets (Steitz, 1978). Two mechanisms accounting for the important role of both 5’-terminal capped structures and AUG sequencesin ribosomal binding and initiation of translation on the SV40 late mRNAs in which there is a considerable distance between 5’ termini and AUG initiators may be entertained. First, the 5’ termini and internal AUG triplets may be effectively approximated as a result of base-pairing within the intervening sequencesor between sequencesnear 5’ termini and the ribosomal binding sites, permitting ribosomes to roll directly from 5’ termini to internal AUG sequences.Secondly, it is possible that ribosomes bind to 5’ termini and then move along mRNA until an initiator AUG is reached, as proposed by Kozak & Shatkin (1978). In either case the distance between 5’-terminal caps and AUG triplets would not be critical for initiation of translation. It is, on the other hand, possible that the mRNAs with long leaders are less efficient messagesthan those with short leaders. We have pointed out previously (Celmaet al., 1977a; Ghosh et al., 1978)the presence of an AUG sequence on the 16 S mRNA leader only 11 nucleotides from the 5’ terminus. Although it is followed by 61 sensecodons and then a terminator, we have no evidence that it is translated. Additional AUG sequences are also present on the various 19 S leader sequenceswe have described at residues 5200 to 5202, 5212 to 5214, 17 to 19, 41 to 43, 61 to 63, 65 to 67, 116 to 118, 120 to 122 and 303 to 305. However, there is no evidence suggesting that they serve as translation initiation sites and all are followed by terminator codons short distances downstream. Since ribosomal attachment to any of these AUG sites on the leaders would probably not permit initiation of translation at the downstream VP1 and VP2 initiation codons, we assumethat mechanismsexist for making these AUG sequencesinaccessibleto ribosomes. Some of these sequencesmay undergo base-pairing with other leader sequencessuch that they cannot interact with ribosomes. We thank Dr J. Beard, Research Resources, Viral Oncology Program, National Cancer Institute, for the very generous gift of avian myeloblastosis virus reverse transcriptase, MS Regina Ezekial and Vinnie Itri and Messrs Allan McCluskey and Nick Davis for excellent technical assistance and Mrs Betty Hackett for typing this manuscript. We also thank Dr Kim DeReil for valuable advice concerning reverse transcription of RNAs, Dr Don Crothers for helpful discussions concerning models for the secondary structure of the late RNAs and Dr Paul Berg for informing us of the presence of RNase in certain preparations of reverse transcriptase and valuable discussions concerned with the interpretations and significance of our results. This work was supported by grants from the American Cancer Society, The National Cancer Institute and The Leonard Eckstein Living Fund for Leukemia Research.

Aloni, Y. Aloni, Y. Aloni, Y., Aloni, Y., U.S.A.

(1974). (1977). Shani, Dhar,

REFERENCES Cold Spring Harbor Symp. Quant. Biol. 39, 165-178. Ped. Eur. Biochem. Sot. Letters, 54, 363-367. M. & Reuveni, Y. (1975). Proc. Nat, Acad. Sk., U.S.A.. 72, 2587-2691. R., Laub, D., Horowitz, M. & Khoury, G. (1977). Proc. Nat. Acad. Sci..

74, 36863690.

Aviv, H. & Leder, P. (1972). Proc. Bachenheimer, S. & Darnell, J. E. Berget, S. M.,Moore, C. & Sharp, P. Berk, A. J. & Sharp, P. A. (1978). Bigger, C. H., Murray, K. & Murray,

Acad. Sci., U.S.A. 69, 1408-1412. (1975). Proc. Nat. Acad. Sci., U.S.A. 72, 4445-4449. C. (1977). Proc. Nat. Acad.Sci., U.S.A. 74, 3171-3175. Proc. Nat. Acad. Sci., U.S.A. 75, 1274-1278. N. E. (1973). Nature New Biol. 244, 7-9.

Nat.

STRUCTURE

OF

SV40

LATE

RNAB

x45

Carroll, R. B. $ Smith, A. E. (1976). Proc. Nat. Acad. Sci., U.S.A. 73, 2254-2258. Casey, J. $ Davidson, N. (1977). Nucl. Acids Res. 4, 1539- 1552. Celma, M. L., Dhar, R., Pan, J. & Weissman, S. M. (1977a). NucZ. Acids Res. 4, 2549-2559. Celma, M., Pan, J. $ Weissman, S. (19773). J. BioZ. Chem. 252, 9043-9046. Chou, L. T., Gelinas, R. E., Broker, T. R. & Roberts, R. J. (1977). Cell, 12, l--8. Colr, C., Landers, T., Goff, S., Manteuil-Brutlag, S. & Berg, P. (1977). J. I’iroZ. 24, 277.. 294. (!rawford, L. V., Cole, C. N., Smith, A. E., Paucha, E., Tegtmeyer. P., Rundell. K. & Berg, P. (1978). Proc. iVat. Acad. Sci., U.S.A. 75, 1177121. Danna, K. $ Nathans, D. (1971). Proc. Nat. Acad. Sci., I’.S.A. 68, 2913-2917. Dhar, .K., Snbramanian, K., Zain, B. S., Pan, J. 6 Weissman, 8. M. (1974). Cold sprircq Harbor Symp. @ant. Biol. 39, 153.-160. I)har, K., Subramanian, K. N., Zairr, B. S., Levine. A.. Patch. (I. d Weissman. S. M. (1975). INSERM, 47, 25-32. Char. R., Subramanian, K. N., Pan, J. bz Weissman, S. M. (I 977a). Proc. Nut. dead. Sri.. U.S.A. 74, 827-831. Dhar, H., Subramanian, K. N., Pan, tJ. & Weissmao. S. M. (1977b). .J. Biol. Chem,. 252. 368 -376. lhxns. H., F’rascar. N.. Ziff, E., Weber, J ., Wilson. M. K: Darncll. .I. E. (1977). Cell. 12. 733m 73!). Ferdinand, G., Hrowm. M. & Khoury. G. (1977). l+oc. LVat. Acatl. Sci.. 17.S.d. 74. 5443-5447. Piers, W.. Contreras, It., Haegeman, G., Rogiers, K., Van de Voorde, A., Van Hruverswyn. H.. Van Heereweghe, J., Volchaerstand, G. Kr Ysebarr, M. (1978). Nature (LontlorL). 273, 113.-120. (:hosh, P. K., Reddy, V. B., Swinscoe, J., Choudary. l’.. Lebowitz, P. & Weissman. S. ?ul. (1978). d. BioZ. Chem. 253, 3643-3647. Gibson, W. (1975). I’irology, 68, 539-543. Goldberg, S.. Weber, .I. & Darnell, J. E. (1977). Cell, 16, 617.-621. Gralla. J. &I Crothers, D. M. (1973a). J. ilfol. Biol. 73, 497-511. Gralla. J. & Crothers, D. M. (1973b). J. Mel. BioZ. 78, 301-319. (ironer, Y.. Carmi, P. & Aloni, Y. (1977). Nucl. Acids Res. 4. 39% 3968. Haegeman. G. & Fiers, W. (1978a). ,7. l’irol. 25, 824-830. Haegeman, (:. & Fiers, W. (1978b). Xature (London), 273, 70-73. Hsu, M. Ji Ford, J. (1977). Proc. Nat. Acad. Sci.. U.S.A. 74. 4982-4985. Khoury, G. & May, E. (1977). J. Viral. 23, 167-176. Khourg., G., Byrne. 5. C. & Martin, M. A. (1972). Proc. AYat. Acart. Sci., c:.R.A. 60. l!J25-1928. Klloury, G., Martin, M. A., Lee, T. N. H., Danna. K. & Nathans, D. (1973). ,J. ,%fo/. Rio/. 78. 377-389. Khoury, C:.. Howley, P., Nathans, D. & Martin, M. (1975). J. I’irol. 15, 433%437. Khoury, G., Carter, B. J., Ferdinand, F. J.. Hawley. P. M.. Brow-n. M. & Martin. M. ;\. (1!176). J. l-irol. 17, 832-840. Klessis. D. (1977). Cell, 12, 9-22. Knapp. (:., Beckmann, J. S., Johnson. P. F.. Fuhrman, 8. A. dz Abelson, ,I. (1978). Cell. 14. “21-236. Kozak, M. & Shatkin, A. (1978). J. BioZ. Chem. 253, 6568-6577. Lavi, S. & Groner, Y. (1977). Proc. Nat. Acad. Sci., U.S.A. 74, 5223-5327. Lax-i. 8. & Shatkin, A. J. (1975). Proc. Nat. Acad. Sci., U.S.A. 72, 2012-2016. Lindst,rom. D. M. $ Dulbecco, R. (1972). Proc. Nat. Acad. Sci., U.S.A. 69, 1517-1520. Maxam, A. M. & Gilbert, W. (1977). Proc. Nat. Acad. Sci., I’.S.A. 74, 560-564. Ma>-, E.. Kopecka, H. & May, P. (1976). NucZ. Acids Res. 2, 1995-2005. May, E., Maizel, J. V. & Salzman, N. P. (1977). Proc. Nat. Acad.Sci., U.S.A. 74, 496-500. McMacken, I<., Ueda, K. & Kornberg, A. (1977). Proc. IVat. Acad. Sci., V.S..4. 74, 4190--4194. Murray. Ii. & Old, R. W. (1974). Progr. NucZ. Acid Res. dloZ. BioZ. 14, 117~ 185. ~Murray. Ii., Hughes, 8. G., Brown , J. 8. & Bruce. S. A. (1976). Rio&m. J. 159, 3 1 Y-322.

846

P.

K.

GHOSH

ET

AL.

Pan,

J., Reddy, V. B., Thimmappaye, B. & Weissman, S. M. (1977). Nucl. Acids Rea. 4, 2539-254s. Peacock, A. C. & Dingman, C. W. (1968). Biochemistry, 7, 668-674. Penman, S. (1966). J. Mol. Biol. 17, 117-130. Prives, C. L., Aviv, H., Gilboa, E., Revel, M. & Winocour. E. (1974a). Cold Spring Harbor Symp. Quant. Biol. 35, 309-316. Prives, C. L., Aviv, H., Paterson, B. M., Roberts, B. E., Rosenblatt, H. S., Revel, M. & Winocour, E. (19746). Proc. Nat. Acud. Sci., U.S.A. 71, 302-306. Prives, C. L., Aviv, H., Gilboa, E.. Winocour, E. & Revel, M, (1975). INSERM, 47, 305-312. Prives, C. L., Gilboa, E., Revel, M. & Winocour, E. (1977). Proc. Nat. Acad. Sci., U.S.A. 74, 457-461. Reddy, V. B., Dhar, R. & Weissman, S. M. (1978a). J. Biol. Chem. 253, 621-630. Reddy, V. B., Thimmappaya, B., Dhar, R., Subramanian, K. N., Zain, B. S., Pan, J., Celma, M. L., Ghosh, P. K. & Weissman, S. M. (1978b). Science, 200, 494-502. Reddy, V. B., Ghosh, P. K., Lebowitz, P. bz Weissman, S. (1979). Nucl. Acids. Res. In the press. Roberts, R. J., Meyers, P. A., Morrison, A. $ Murray, K. J. (1976). J. Mol. Biol. 102, 157-165. Sambrook, J., Sharp, P. A. & Keller, W. (1972). J. Mol. Biol. 70, 57-71. Sambrook, J., Sugden, B., Keller, W. & Sharp, I’. A. (1973). Proc. Nat. Acad. Sk., U.S.A. 70, 3711-3715. Schertzinger, E., Lanka, E. & Hillenbrand, G. (1977). Nucl. Acids Res. 4, 4151-4163. Schibler, U. & Perry, R. P. (1976). Cell, 9, 121-130. Shenk, T. E., Carbon, J. & Berg, P. (1976). J. J”irol. 18, 664-671. Sleigh, M. J., Topp, W. C., Hanich, R. & Sambrook, *J. F. (1978). Cell, 14, 79-88. Steitz, J. (1978). In Genetic Signals and Nucleotide Sequences in Me88engW RNAs in Biological Regulation and Development (Goldberger, R., ed.), Plenum Publishing Company, New York. Thimmappaya, B., Reddy, V. B., Dhar, R., Celma, RI., Subramanian, K. N., Zain, B. S., Pan, J. & Weissman, S. M. (1978). Cold Spring Harbor Symp. Qua&. Biol. In the press. Tilghman, S. M., Curtis, P. J., Tiemeier, D. C.. Leder, I’. & Weissman, C. (1978). Proc. Nat. Acad. Sci., U.S.A. 75, 1309-1313. Tonegawa, S., Maxam, A. M., Tizard, R., Bernard, 0. & Gilbert, W. (1978). Proc. Nat. Acad. Sci., U.S.A. 75, 1485-1489. Uhlenbeck, 0. C., Borer, P. N., Dengler, B. & Tinoco, I. Jr (1973). J. MoZ. Biol. 73, 483-496. Van de Voorde, A., Contreras, P., Rogiers, P. & Fiers, W. (1974). Cell, 9, 117-120. Vennstrom, B., Pettersson, U. & Phillpson, L. (1978). NucZ. Acids Res. 5, 195-204. Weber, J., Jelinek, W. & Darnell, J. E. Jr (1977). Cell, 10, 611-616. Wei, C. M. & Moss, B. (1977). Proc. Nat. Acad. Sci., U.S.A. 74, 3758-3761. Weinberg, R. A. & Newbold, J. E. (1974). Cold Spring Harbor Symp. Quant. Biol. 39, 161-163. Weinberg, R. A., Warnaar, S. 0. & Winocour, E. (1972). J. viral. 10, 193-201. Weinberg, R. A., Ben-Ishai, Z. & Newbold, J. E. (1974). J. Viral. 13, 1263-1273. Zain, B. S., Dhar, R., Weissman, S. M., Lebowitz, P. & Lewis, A. M. Jr (1973). J. viral. 11, 682-693. Note added in proof: Following submission of this paper, Breathnach et al., (1978) Proc. Nat. Acad. Sci., U.S.A. 75, 4853-4857 and Cattarall et al., (1978). Nature, 75, 725-728, published nucleotide sequences of 10 splices in chicken ovalbumin mRNA. In accord with our findings for SV40 RNAs, all 10 splices occurred within short reiterated sequencesG, AG, GG, UCAG and AGGU-at the termini of donor and acceptor mRNA segments, creating ambiguities in assigning precise splice points.