Sequences of human adenovirus Ad3 and Ad7 DNAs encoding the promoter and first leader segment of late RNAs

Sequences of human adenovirus Ad3 and Ad7 DNAs encoding the promoter and first leader segment of late RNAs

Gene, 13 (1981) 133-143 133 Elsevier/North-ttolland Biomedical Press Sequences of human adenovirus Ad3 and Ad7 DNAs encoding the promoter and first...

868KB Sizes 2 Downloads 42 Views

Gene, 13 (1981) 133-143

133

Elsevier/North-ttolland Biomedical Press

Sequences of human adenovirus Ad3 and Ad7 DNAs encoding the promoter and first leader segment of late RNAs (Promoter for RNA transcription; RNA splicing; nucleotide sequence symmetry)

Jeffrey A. Engler, Louise T. Chow and Thomas R. Broker Cold Spring Harbor Laboratory, CoM Spring Harbor, N Y 11724 (U.S.A.)

(Received October 7th, 1980) (Accepted November 30th, 1980)

SUMMARY DNA segments containing the major promoter at coordinate 16.5 for rightward transcription from human adenovirus serotypes 3 and 7 (Ad3 and Ad7), two closely related class B viruses, have been sequenced and found virtually identical. Furthermore, over 80% of the nucleotides of Ad3 and Ad7 in this entire region are homologous to their counterparts in the DNA of the more distantly related class C serotype Ad2. There are the same number of nucleotide pairs among these serotypes within the region compared. Most changes are transitions or transversions and the several single-base deletions are always compensated by nearby insertions. These few changes nonetheless result in 24 differences between Ad7 (or Ad3) and Ad2 in a total of 32 cleavage sites. The promoter for the rightward-transcribed RNAs and the first segment of the consanguinous tripartite leader found at the 5'-ends of all the later mRNAs derived from that promoter have been identified by analogy to the nucleotide sequences of Ad2. In particular, the "Hogness box" or RNA polymerase staging site for the major rightward transcription unit is completely homologous to that of Ad2. There are only six bp changes in the first late leader segment despite previous evidence suggesting that they might be quite heterologous. A prominent dyad axis of symmetry exists just upstream from the presumed 5'-end of the late RNA. However, unlike the stem-loop structure proposed for Ad2 by Ziff and Evans (1978), the base changes relative to Ad2 mandate a different potential stemloop structure in the single strand of Ad3 and Ad7 DNAs. This hairpin places the "Hogness box" immediately next to the 5'-end of the RNA at the base of stem. An analogous dyad axis of symmetry or stemloop structure can be found in a number of eukaryotic systems, including the major rightward transcription unit of Ad2. This feature may be of relevance to the positioning of RNA polymerase II on the DNA and to the promotion of transcription.

INTRODUCTION RNA transcription during productive infections of cultured human cells by adenovirus can be divided Abbreviations: Ad2, Ad3, and Ad7, adenovirus of serotypes 2, 3 and 7, respectively; bp, base pairs.

into immediate early, early, intermediate, and late stages (Lewis et al., 1980). Almost all the messenger RNAs are spliced (Berget et al., 1977; Chow et al., 1977; 1978; 1979; Kitchingman et al., 1977; Berk and Sharp, 1978; Kilpatrick et al., 1979). Most of the late RNAs of class B (Ad3 and Ad7) and class C

0378-1119/81/0000-0000/$ 02.50 © Elsevier/North-Holland Biomedical Press

134 (Adl, 2, 5, and 6) viruses are transcribed from the same promoter at coordinate 16.5 on the r-strand of their respective DNAs. More recently, this same promoter has been found to funct:on also at early times and in the absence of DNA replication, but it gives rise to only a subset of the mRNAs seen at late times (Chow et al., 1979; 1980; Lewis and Mathews, 1980). All the mRNAs derived from this promoter have a tripartite leader spliced to their 5'-ends. The late leader sequences and the main bodies to which the leaders are attached are derived from virtually the same locations from each of the genomes. The leader sequences of class B viruses or of class C viruses are homologous within the class, but the leaders of class B viruses do not cross-anneal with class C viral DNA, and vice versa, indicative of sequence divergence (Kilpatrick et al., 1979). The nucleotide sequence of the tripartite leader of Ad2, a class C virus, has been determined (Akusjarvi and Pettersson, 1979a; Zain et al., 1979b). The first leader segment is promoterproximal (Evans et al., 1977) and contains a sequence which may be required for binding to the 3'-end of 18 S ribosomal RNA (Ziff and Evans, 1978). The function of the second and third leader segments is not yet apparent. We have taken the approach of comparative sequence analyses in an attempt to determine the reason for the segmented structure of the late RNA leaders, to def'me conserved primary sequences or secondary structures that might be involved in signal recognition, and to establish the nature and extent of sequence divergence between classes. Selected restriction fragments of Ad3 and Ad7 have been cloned into the vector plasmid pBR322 (Engler and Kilpatrick, 1981). This paper reports our additional cloning and restriction endonuclease mapping as well as sequence analyses of the DNA segments that contain the promoter for the major rightward transcription unit and encode the first leader segments and splice sites of Ad3 and Ad7 late mRNAs.

MATERIALS AND METHODS

the accompanying paper (Engler and Kilpatrick, 1981). Stocks of plasmid DNA were isolated from either t:'. ccdi strain HB101 or HB294 (Boyer and RouUand-Dussoix, 1969). After dialysis to remove CsC1, the plasmids were treated with pancreatic RNase I (Sigma, boiled 10 rain before use to inactivate DNases) to digest contaminating RNA, extracted with phenol twice, and dialyzed against three changes of 0.1 M NaC1, 10 mM Tris "HC1, 1 mM EDTA, pH 7.9, then against two changes of 10 mM Tris - HC1, 1 mM EDTA, pH 7.9. (b) Cloning of DNA encoding the first leader segment Plasmids pJB329 (with a viral insert from coordinates 9.6 to 36.7) and pJB757 (with an insert from viral coordinates 15.9 to 36.7) contain the DNA encoding the three 5'4eader segments of late viral RNA (Engler and Kilpatrick, 1981). Each was digested with restriction endonuclease HindlII and then religated to obtain subclones containing just the first of the three leader-coding regions and lacking all viral sequences to the right of coordinate 17.0. The final concentration of DNA in tlle ligation reaction was 1 -~3 /2g/ml. The DNA was used to transform E. coli HB294 by the procedure of Mandel and Higa (1970) as described in the accompanying paper. Ampicillin-resistant, tetracycline-sensitive colonies were selected and suspended in 0.1 ml Luria Broth (LB) + ampicillin (20 #g/ml) in a microtiter plate and incubated overnight at 37°C. Aliquots from each microtiter well were lysed and their DNA analyzed by agarose gel electrophoresis as described by Barnes (1978). Plasmids that were shorter than pJB329 or pJB757 and that were tile anticipated size (similar to pBR322 for Ad7 subclones, or slightly bigger for Ad3 subclones) were examined by digestion with BamHI or HindIII, or both. One subclone from each serotype which contains an insert of Ad3 DNA from coordinates 9.6 to 17.0 and of Ad7 DNA from coordinates 15.9 to 17.0 was chosen for further study and was designated pJ B329A and pJB757A, respectively.

(c) Restriction enzyme mapping of plasmids containing the first leader-coding segment

(a) Preparation of plasmid DNA Bacterial strains, growth media, enzymes and buffers, and the isolation of plasmid DNA are described in

Restriction endonuclease cleavage sites were determined according to the procedure of Smith and Birnstiel (1976) except that the labeled fragments were

135 digested to completion with the restriction endonuclease to be mapped. 20 #g of plasmid DNA were linearized with either restriction endonuclease HindIII, BamHI (pJB757A) or XhoI (pJB329A) and concentrated by ethanol precipitation. For mapping of cleavage sites relative to the XhoI site of pJB329A, the digestion products were separated by electrophoresis on a 1% agarose gel and the longest band (containing the first leadercoding segment) was electroeluted as described by Tabak and Flavell (1978). The 5'-terminal DNA extensions at the cleavage sites were filled-in with a mixture of four a-a2p-labeled deoxynucleoside triphosphates (specific activity 350 Ci/mmol, 15 pmol each) and 1 unit of the Klenow fragment of DNA polymerase I. After 60 min at 37°C, the reaction was terminated by addition of EDTA and precipitation with 2 M ammonium acetate and 70% ethanol. The pellet was resuspended in 0.3 M sodium acetate and reprecipitated with 2 volumes of ethanol. The pellet was washed once with 70% ethanol, dried under vacuum and resuspended in H20. Unlabeled pBR322 DNA was added as carrier to aliquots of the labeled samples and the preparation was digested with AccII, AluI, HaeIII, HhaI, Hinfl, MnlI, RsaI, or Sau3A restriction endonuclease. Digestion products were separated by electrophoresis on 8% acrylamide gels (40 cm long, 5 V/cm) for 12 h in E buffer (40 mM Tris-base, 20 mM sodium acetate, 1 mM EDTA, pH 7.2). Sau3A fragments of pBR322 DNA that were end4abeled in a similar fashion were used as length standards. Each gel was fLxed in 10% glacial acetic acid and autoradiographed with Kodak NS-5T No Screen Film. In general, two intense bands appeared in the autoradiograph, one derived from the DNA insert and the other from the vector. The band derived from the vector was identified from its expected length inferred from the nucleotide sequence of pBR322 (Sutcliffe, 1979). (d) Nucleotide sequence analysis 20 #g of plasmid DNA pJB329A or pJB757A were digested with restriction endonucleases, followed by treatment with calf intestine alkaline phosphatase (Boehringer-Mannheim, 0.1-0.3 unit, purified as described by Efstratiadis et al., 1977) for 1 h at 37°C. The DNA was extracted twice with phenol followed by ether, precipitated with ethanol, dried

under vacuum, and resuspended in double-distilled water. The 5'-ends of the DNA were radioactively labeled using [3"-3:P]ATP (Amersham) and T4 polynucleotide kinase (New England Biolabs) according to Richardson (1965), except that the DNA was heatdenatured and the solution rapidly chilled in ice before labeling. After 30 rain at 37°C, followed by 10 rain at 60°C to promote renaturation of the DNA, the reaction was stopped by the addition of an equal volume of 0.1 M EDTA, and the DNA was precipitated with 2 M ammonium acetate and 70% ethanol using 1/.tl of a 20 mg/ml yeast RNA stock solution (PL Biochemicals) as carrier. The DNA was resuspended in 0.3 M Na" acetate (pH 7.2) and ethanolprecipitated once more. The pelleted DNA was washed once with 70% ethanol, dried under vacuum, and resuspended in water. The DNA was redigested with a second restriction endonuclease and the products separated by electrophoresis (5 V/cm) in an 8% acrylamide gel in E buffer. After autoradiography, gel slices containing the labeled end fragments were cut out of the gel and the DNA eluted overnight in 0.3 M Na • acetate (pH 7.2). After filtration through a 5 ml Quik-Sep column containing a fiberglass disc (QS-P, Isolab, Inc., Akron, Ohio) to remove acrylamide fragments, the eluate was again precipitated with ethanol, washed once with 70% ethanol, dried under vacuum, and resuspended in 1 0 4 0 ~1 of water. The nucleotide sequence was determined by file chemical degradation method of Maxam and Gilbert (1980), except for a few nucteotides near the HindlII site at 17.0, which were determined by the chain termination method of Sanger et al. (1977).

RESULTS (a) Subcloning and detailed restriction-endonuclease mapping of Ad3 and Ad7 plasmids containing the sequences coding for the first late-RNA leader segments To facilitate the determination of the Ad3 and Ad7 nucleotide sequences encoding the first leader segment of late RNA, some extraneous DNA sequences were removed from pJB329 and pJB757 (Engler and Kilpatrick, 1981) by digestion with restriction endonuclease HindIII and religation of the

136

digestion products. Only products containing the first leader segment joined to pBR322 DNA were anticipated. Four ampicillin-resistant, tetracyclinesensitive transformants from digested and religated pJB329 or pJB757 were tested. All contained one BamHI and one HindlII cleavage site. Digestion with both of these endonucleases yielded two bands, one derived from the vector and the other from the viral DNA insert which corresponded to the segment which contained the first leader coding sequence (data not shown). Two of the plasmids were further studied; plasmid pJB329A, derived from pJB329, contained Ad3 DNA from genome coordinates 9.6 to 17.0, and plasmid pJB757A, derived from pJB757, contained Ad7 DNA from coordinates 15.9 to 17.0.

0

I00

I

I

15.6

Detailed maps of the restriction endonuclease cleavage sites of plasmids pJB329A and pJB757A were prepared as described in MATERIALS AND METHODS and are shown in Fig. 1. A total of eight restriction endonucleases were utilized. Cleavage sites that were mapped as well as sites that were subsequently inferred from the nucleotide sequence are presented. The positions of Ad3 and Ad7 cleavage sites in this region were virtually identical with only one exception: an extraHaellI site was found in Ad3 at nucleotide - 1 6 . Although not detected by restriction enzyme mapping, the HaelII site at nucleotide 175 (Fig. 1) (corresponding to sequence position - 6 3 in Fig. 2) in Ad7 was also present in Ad3, as revealed later by the nucleotide sequence of the region.

Length (base pairs) 200 500 "

~

400 q

I

500 ]

15.9

IZO

BamHI MR

Ca)

III

:

:

CHE

Ili

i

i

ASM

i Ad 5

E

XhoI

HindTTr

(b)

(c)

(

Ad7

ill

<

(d)

()

)

t Ad3

Fig. 1. Restriction endonuclease mapping of Ad3 and Ad7 and sequencing strategy in the vicinity o f the DNA encoding the first leader segment. (a) The cleavage sites were determined relative to the XhoI, BamHI and HindlII sites (long vertical lines) mapped by Tibbetts (1977) using plasmids pJB329A and pJB757A, as described in Materials and Methods. The restriction endonucleases have been abbreviated as follows: AcclI (C), AluI (A), HaellI (E), Hhal (H), HinfI (F), Mnll (M), RsaI (R), Sau3A (S). Dashed lines show the positions o f additional cleavage sites determined from the nucleotide sequence. (b) The arrow marks the position of the first leader segment and points in the direction of transcription. (c) The sequencing strategy for Ad7. Arrows denote the region o f sequence read in each reaction and point from 5' to 3% (d) The sequencing strategy for Ad3.

137

(b) Nucleotide sequence derivation of the region encoding the first late RNA leader The DNA sequences were determined from pJB329A and pJB757A by chemical degradation methods (Maxam and Gilbert, 1980) except that a few nucleotides near coordinate 17.0 were obtained by the chain termination method of Sanger et al. (1977). Almost all of both strands of Ad7 were sequenced but, for the most part, the sequence of only one strand of Ad3 was determined. The sequencing strategies are shown in Figs. lc and ld. The /-strand DNA sequence (which is the same as the late, rightward transcribed RNA sequence) is presented in Fig. 2. The position of the first leader segment, deduced by analogy to the nucleotide sequence of the first leader-coding region of Ad2 (Akusjgrvi and Pettersson, 1979a, b; Zain et al., 1979a, b) is indicated.

--140

Ad2 5' Ad7 5'

The DNA sequence is numbered starting at the first (5') nucleotide of the inferred leader sequence. This position corresponds to approximately nucleotide 240 in Fig. 1. The beginning of the "Hogness-box", a component of the presumptive promoter for the major rightward transcribed RNA transcript, is located - 3 1 nucleotides upstream from the 5'-end of the leader. Only four nucleotides differ between Ad3 and Ad7 (Fig. 2): a G, instead of A, at position - 1 3 8 in Ad3 removes a BamHI cleavage site in Ad7; a G, instead of A, at position - 1 6 generates a HaeIII cleavage site in Ad3; a G, instead o f T , at position 61 and a C, instead of G, at position 145 in Ad3 involves no cleavage site. All cleavage sites previously determined as shown in Fig. 1 are present in their expected positions in the DNA sequences.

-ii0

-120

-130

GGG T C C A C T C (G G)A T C C A C CT

GCT CC A6 GGT T C T C T A CG G T

6TGAA6ACAC ATGTAAACAC

ATGTCGCCCT ATGTCCC~ B~3T

BamHI -100

Ad2 5' Ad7 5'

90

CT T CGGCA T C CC T CCA CA T CC

-4O

-50

Ad2 5' Ad7 5'

GGTG GG G

TCCTG CCCCG

Ad2 5' Ad7 5'

ACTC

CTTCC

ACTG

CTTCC

l

- 70

-80

A AGGA A GGT G A A GA A T GT G

A T T GGT T T A I A T T GGC T T GT

AAOOGGGG CCGGGGGG

GGG

20

10

GCATCG GGAICG Sau3A

C TGT C TGT

CACGT C ACGI

Hhal

GAC GAC

C G CAG

G T T C G T CC TC G T T CG T C C TC

70

AG C G G G C A TGCGGGCA

G G

90

80

ACTT CTGCGC A C C T C T G CAC

TAAGATTGTC TCAGGT TGTC

G G'T 6 A G T A C l GGTAGG TATT

)

C C C T C T CA A A C CC T C T CG A A

Ad2 5' Ad7 5'

A A CGA GGA GG A A CGA GGA GG 160

170

180

190

Ad2 5'

GG CCG CGT C C

A T C T GGT CA G

Ad7 5'

C T C T

CGT C C

AT CT GGT C AG

AAA AGA C AAT A A A AC AC AA T

C T TIT T GT T G C TTCTT GT T G

210

220

230

2,10

Ad2 5' Ad7 5'

T GG CA A A C GA C CCGT A GA GG T GG CA A A T GA T C CA T A GA GG Sau3A Mnl I

GCG I T GGA C A GCG T T GGA T A

GC A A C T T GG C G A A G C T(T) Hi ndl I I

120

110

A I T T GA T A A T T T GA T A

r T

130

CAC CTGGC CC GACAGTACCA Rsal

G CGGTGATG G CAGAGATG

t Oo

AG T T T C C A A A AG T T T C T A G G

Ad2 5' Ad7 5'

3' 3'

SO

40

C A G C r GT T GG CAGC T G T T G G Alul

first leader segment 60

3' 3'

-10

GTGGGGGCGC GCGGACCTCT

30

CTGCGAGGGC CCAGGAGCGC

3'

-60

C

-20

-3O

ATA promoter

AGGTGTAGG AAGTGTAGGC

3 I

140

150

C C

C T T TGA GGGT C TI TTATAAGA Hinfl 200

T CAAG C T IG G T CC AG C T T G G Alul

Fig. 2. The nucleotide sequence of Ad2 (Baker and Ziff, 1 9 7 9 ) a n d A d 7 D N A s between coordinate 15.9 and 17.0 containing the

major rightward promoter. The position of the presumptive promoter sequence or polymerase staging site ("Hogness box") for rightward transcription and the first leader segment of the r-strand transcripts of Ad7 were assigned by inference based on the analogy to the Ad2 sequences. Divergences in sequence between Ad2 and Ad7 are shaded. The sequences of Ad7 and A d 3 D N A in this region are identical except for the four differences noted in the text. The restriction endonuclease cleavage sites, mapped as described in F i g . 1, are underlined in the DNA sequence.

3 I

3'

138 (c) Analysis of the DNA sequences and further discussion The most striking feature of the DNA sequence obtained near the region encoding the first late RNA leader is the high degree of homology among class C adenovirus 2 and class B adenoviruses 3 and 7.80% of the nucleotide sequences are identical. Because of this similarity, the positions of the presumptive major riglitward RNA polymerase staging site (the "Hogness box") and the first leader segment in Ad3 and Ad7 were assigned by analogy to those of Ad2 (Aku@rvi and Pettersson, 1979a, b; Ziff and Evans, 1978: Zain et al., 1979a, b). The sequences of the "Hogness box" and of the 5'- and 3'-ends of the first leader were conserved in all three serotypes. The assignment of the first leader coding sequence is supported by the presence at the presumptive splice junction of the sequence GSGTPuPuG, a consensus sequence found at many splice sites of Ad2 and other eukaryotic systems (Breathnach et al., 1978; Akusj/irvi and Pettersson, 1979b; Zain et al., 1979b; Perricaudet et al., 1979; Self et al., 1979; Lerner et al., 1980; Rogers and Wall, 1980). We note an interesting point that the sequence at the beginning of the intervening sequence almost always contains one of the three protein termination codons (UAA, UGA, or UAG). Although there are two changes between class C and class B adenoviruses in this region, this observation remains satisfied (Fig. 2). Lamer et al. (1980) and Rogers and Wall (1980) have independently recognized that the small nuclear RNA U1 contains a short sequence which is complementary to the consensus upstream splice junctions. The trinucleotides UAA, UGA, and UAG found at splice junctions are all able to form at least two A-U base pairs with the three nucleotides (AUU) implicated in the proposed binding of U1 RNA to splice junctions. However, the trinucleotide UGG, rarely seen in splice junctions determined so far, could only form one A-U and two U-G base pairs with the AUU of the U1 RNA. This latter nucleotide interaction might not be sufficient for binding of U1 (or a similar small nuclear) RNA to a primary (unspliced) RNA molecule because U-G base pairs provide little stabilizing force. The primary divergences in the sequences of the two classes of virus are transitions or transversions; there are only a few single-base deletions and these are always compensated by single-base insertions

nearby. Scattered divergences occur in sequences located between the Hogness box and the first leader, in 6 of 41 nucleotides within the first leader and close to the 3' (upstream donor) splice junction which is eventually linked to the 5'-side of the second leader segment in the RNA. These divergences apparently affect neither the primary transcription nor the splicing of the first leader segment to the second leader. Because the 20% nonhomologous nucleotides between Ad2 and 3 or 7 are scattered throughout this region, only eight out of 32 restriction endonuclease cleavage sites are identical while 24 are different. This observation emphasizes that profound differences in restriction maps does not necessarily imply major differences in DNA sequence. Kilpatrick et al. (1979) were unable to observe RNA : DNA heteroduplexes between the leaders of Ad3 or Ad7 late mRNA and Ad2 DNA (and vice versa) over a wide range of hybridization temperatures, based on electron microscopic analysis. In view of the relatively high degree of homology found in the sequences of the first leaders (35 out of 41 nucleotides, or 85% homology), several factors may have prevented formation of the desired heteroduplexes. Mismatches in otherwise homologous DNA sequences decrease the melting temperature (Tm) of the duplexes by 0.67 1.1°C for each 1% mismatch (Laird et al., 1969; Hutton and Wetmur, 1973; Smiley and Warner, 1979). In addition, very short regions of homology have a substantially lower Tm (designated Tm*) than regions of at least several hundred nucleotides Tm* = T m

(500/nucleotide length)

(Wetmur and Davidson, 1968). Because this short 41nucleotide sequence is broken into several small units by the scattered mismatches, the Tm of the heterologous RNA : DNA heteroduplexes might be expected to be much depressed relative to homologous RNA : DNA heteroduplexes, but it cannot be estimated accurately for lack of empirical data on analogous systems. RNA molecules also tend to develop very stable intrastrand secondary structures as the temperature is lowered. Such RNA self-pairing has most likely prevented the establishment of the heterologous RNA : DNA heteroduplexes that could have been expected to form at some of the temperature ranges tested by Kilpatrick et al. (1979).

139

The overall homology between Ad2 (a class C virus) and Ad3 and Ad7 (class B viruses) has been estimated to range between 15% and 30% (Garon et al., 1973; Green et al., 1979). For this reason, the high degree of homology between Ad2 and Ad3 and Ad7 in this region was unexpected. Perhaps the first leader region has been highly conserved during adenoviral evolution because it contains the transcription start signals for almost all late viral genes. Alternatively, the other physical methods may have underestimated the homology between adenoviral serotypes because of the short interspersed divergent sequences that substantially destabilized the heteroduplexes. A similar underestimation of nucleotide sequence homology measured by several methods have also been found between SV40, polyoma, and BK papovaviruses (Ferguson and Davis, 1975; Yang and Wu, 1979; HoMey et al., 1979; Soeda et al., 1980). At present, the DNA sequences between map coordinates 0 to 4.5 of AdS (class C) (Van Ormondt et al., 1978), Ad7 (class B) (Dijkema et al., 1980), and Adl2 (class A) (Sugisaki et al., 1980) serotypes have been deternfined. In regions where proteins such as DNA and RNA polymerases and RNA splicing systems must recognize polynucleotides, the sequences are q, tite highly conserved (approx. 70-80% homologous). The sequences are more extensively divergent (40-50% homology) in intervening sequences and in other noncoding regions as well as within some prorein coding regions where there might be less selective pressures or where the pressure would be exerted on the conservation of the functions of the proteins encoded rather than on polynucleotides (Sambrook et al., 1980). To date, the mechanisms or signals by which the RNA polymerase II recognizes a promoter are not known. Ziff and Evans (1978) suggested that the Ad2 DNA sequence between the "Hogness box" and the 5'-end of the RNA leader coding sequence could be aligned to form a hairpin structure (Fig. 3a). They speculated that this configuration might help stabilize an initiation complex for RNA transcription. When the Ad3 or Ad7 sequences are arranged in an analogous manner, the "hairpin" contains many mismatched nucleotides, a result of seven differences between the class B and class C viruses in this region (Fig. 3b). The polynucleotide chain can, however, be arranged in a slightly different configuration which brings the "Hogness box" and the 5'-end of the pre-

sumptive first leader coding segment into close proximity (Fig. 3d). It is noteworthy that, of the seven differences between Ad7 and Ad2 in this region, three create favorable base pairs in this alternative stem-loop structure and four constitute the turn-around loop, presumably a sequence devoid of selective pressure. Interestingly, the Ad2 DNA sequence can also be arranged in a manner analogous to that of Ad7 (Fig. 3c). Interactions with other macromolecules may help stabilize this alternative hairpin (Fig. 3c) relative to the potentially more stable structure proposed by Ziff and Evans (1978) (Fig. 3a). The attractive features of this second possible configuration are that the "Hogness box" and leader sequences are in close

T

.T

C

T

*1

L

-I rd :~ [ ::~ ( c:

*[,

l L L,,

i =A ', ~ C ,5 T '~ :-C '~J - T 'i~C A~=T A~T A C CGC

C G/

TATAAAA

21

5

Ei

5 li:~C A -~ =T A~=T p, ~ A A c c G G

G 'C

'G G G T G T 6~C ,c

ij

7 G G--T G C :3 C G T A CTCTCTT

G G ~J)

T C

-C = G G T

G_C G~C fl - T G=C T AI A AAA A C T G T C T T

Fig. 3. Possible hairpin structures formed between the promoter and first leader coding segment. The "TATA box" and first leader coding segment are indicated by lighter and darker shading, respectively. (a) Ad2 hairpin tbrm suggested by Ziff and Evans (1978); ( b ) t h e analogous hairpin drawn using the Ad7 sequence. A similar hairpin in Ad3 contains one fewer base pair due to a G/G mismatch at nucleotides - 5 and 20; (c) an alternative hairpin which can form in Ad2 in an analogous manner to that for Ad7, shown in (d). ( d ) A more stable hairpin formed from Ad7. In (b) and (d) the Ad7 nucleotides that differ from those in Ad2 are indicated by an asterisk.

140

(A)

(B)

AD5 EARLY REGION IA PROMOTER

RABBIT BETA-GLOBIN PROMOTER REGION

ATAAATAT -31 (467)

/T\ T C \ / G~C AI--T G~C T=A G A G G A C C A C~-G XG G A A G GC G U C=--G I IllH II II[l i C ACT CTT G A G T G C G GTGAGAACTCACG +i G~C (498) G~C G--T C C C T A=T C~G T=A ,C=--G\ A G

G/G--G\c \A A / C~G G~C A=Tx~ G=--C/b A--T C=-G A U G~C \G G A A G G C G/ G--T Ii!l IJ fill 1 :~i!i,~i!t:!i~,ii!Ai!~!iiAi T'A ~ A C T T TT GTATTTT ATGTGAACGAAAA -B~ C A C=--G GIC T=A C~G\ T=A/C C~G G~C /T T\ C G \C_C /

"\ A /

(D)

(C) oB~JSKY_~FIBROIN GENE PROMOTER REGION

OVALBUMIN

\T=A / T~A C~T= A AJA=T C C

T-G G--C CATATTTTTC -3z

C ~ A G G A A G G C G U/ I II t lit ! ! IULt

TAGTCAAGCCAAGGTT C=G A C A=T G G T'~T=A ~zA=T A=T A=T / \ A 6 \~a /

REGION

A/G\c

T/T".C

T=A

PROMOTER

C

C C T=A C~G G--T\_ G--T/b GIC A A=T \G G A A G G C G U C C C--G tt~ It ltl......... c c C T A~ArA ATATATAAGG G ATGTATGTCGATCTT -31 G=--C T=A C---G C A\ C A/C G=--C A=T

/ IlL......

9\ T\c/G

Fig. 4. Possible hairpin structures formed between the promoter and 5'-end c ~ ~everal eukaryotic mRNAs. The "TATA boxes" and 5'-ends of the mRNAs are indicated by lighter and darker shading, re.pectively. Potential regions o f complementarity between the 5'-end o f the mRNA and the 3'-end of 18S rRNA are indicated b: shGr~ verJ ".al lines on the right arm of the cruciform. (A) Early region 1A (Ela) o f Ad5 (Van Ormondt et al., 1978). The compaa 'fie region c f Ad7 and A d l 2 can also be arranged in this manner (Dijkema et al., 1980; Sugisaki et al., 1980). (B) The rabbit /3-g )tin gene IHardison et al., 1979). (C) The silk fibroin gene from Bombyx (Tsujimoto and Suzuki, 1979). (D) The chicken ovalbu hi: gene (U~mnon et al., 1979).

141 proximity and that the RNA transcript starts at the first purine nucleotide downstream from the stem (see Figs. 3 and 4). A related view of these stem-loop structures is that they indicate a dyad axis of symmetry in the region between the "Hogness box" and the start of the mRNA. Several other examples of prominent dyad symmetry are found both in adenovirus and in some other systems and they are depicted in stem-loop structures in Fig. 4 for ease of presentation (Konkel et al., 1978; Gannon et al., 1979; Tsujimoto and Suzuki, 1979; Hardison et al., 1979). A similar dyad axis is, however, not readily apparent in some examples of promoter-RNA start pairs that have been sequenced (Baker et al., 1979; Fiers et al., 1978; Ghosh et al., 1978a, b; Haegeman and Fiefs, 1978; Reddy et al., 1978; Smith et al., 1979; Tonegawa et al., 1978). At least in the early region of SV40, neither the nucleotide sequence between the prorooter and the 5'-end of the mRNA nor the presence or absence of a stem4oop seem to be important (Benoist and ChaJnbon, 1980; Gluzman et al., 1980; Ghosh et al., 1981). However, this dyad axis may enhance the activity of certain promoters over others. For example, experiments with in vitro transcription systems show strong activity only at the major late promoter which has the dyad symmetry and not at the nearby IVa2 promoter (Weft et al., 1979; Manley et al., 1980; S-L. Hu, personal communication). A 9 bp region of complementarity between the 5'end of the first late mRNA leader of Ad2 and the 3'end of 18S rRNA has been noted and postulated to play a role in the binding of mRNAs to ribosomes (Ziff and Evans, 1978). A cytosine to guanine change at base +12 in the sequence of Ad3 or Ad7 reduces the continuous complementarity to seven nucleotides. This may not greatly affect the ability of the mRNA to bind to the ribosome because other examples of possible interactions between the 3'-end of 18S rRNA and the 5'-end of eukaryotic messenger RNAs involve even shorter complementary regions (Hagenbfichle et al., 1978). Another surprising aspect of the sequence obtained was the high degree of homology of Ad2, Ad3, and Ad7 DNA in the intervening sequence to the right of the first leader segment as well as to the left of the Hogness box. Our preliminary results show that an open reading frame is encoded by the/-strand of Ad7 DNA starting to the right of genome coordi-

nate 19 and continuing leftward in a frame with the IVa2 mRNA to a stop codon at coordinate 15.1 in the intervening sequence removed from the IVa2 messenger RNA. Examination of Ad2 sequence in this region (Baker and Ziff, 1980) reveals a similar open reading frame. The divergent nucleotides between Ad2 and Ad7 (or Ad3) are primarily in the third codon positions in the open reading frames. A set of high molecular weight proteins has recently been correlated with a family of spliced /-strand transcripts synthesized at early times from a promoter at genome coordinate 75 and having message bodies extending leftward from coordinates 30, 26, or 23 to 11 (StiUman et al., 1981). The hypothetical protein we noted from the DNA sequence might be encoded by one of these mRNAs. Work is also in progress to determine the class B viral DNA sequences between coordinates 18 and 29, spanning the second and third RNA leader coding regions as well as in the IVa2 and peptide IX genes.

ACKNOWLEDGEMENTS We thank John M. Scott and Robert LaPorta for technical assistance and Marie Moschitta and Michael Ockler for excellent assistance with manuscript preparation. Drs. S. Zain, J. Rosen, E. Ohtsubo, and D. Sciaky provided useful discussions and advice on DNA-sequencing protocols. This research was sponsored by the National Cancer Institute Program Project Grant CA13106 to Cold Spring Harbor Laboratory. J.A.E. is the recipient of a postdoctoral fellowship from the Leukemia Society of America.

REFERENCES Akusj~irvi, G. and Pettersson, U.: Sequence analysis of adenovirus DNA: complete nucleotide sequence of the spliced 5' noncoding region of adenovirus 2 hexon messenger RNA. Cell 16 (1979a) 841-850. Akusj//rvi, G. and Pettersson, U.: Sequence analysis of adenovirus DNA IV. The genomic sequences encoding the common tripartite leader of late adenovirus messenger RNA. J. Mol. Biol. 134 (1979b) 143-158. Baker, C.C. and Ziff, E.: Biogenesis, structures, and sites of encoding of the 5' termini of adenovirus-2 mRNA. Cold Spring Harbor Symp. Quant. Biol. 44 (1980)415 428.

142 Baker, (7.(7., ttdrissd, J., Courtois, G., Galibert, t.. and Ziff, E.: Messenger RNA for the Ad2 DNA binding protein: DNA sequences encoding the first leader and heterogeneity at the mRNA 5' end. Cell 18 (1979) 569- 580. Barnes, W.M.: Plasmid detection and sizing in single colony lysates. Science 195 (1977) 393 394. Benoist, C. and Chambon, P.: Deletions covering the putative promoter region of early mRNAs of simian virus 40 do not abolish T-antigen expression. Proc. Natl. Acad. Sci. USA 77 (1980) 3865 3869. Berget, S.M., Moore, C. and Sharp, P.A.: Spliced segments at the 5' terminus of adenovirus-2 late mRNA. Proc. Natl. Acad. Sci. USA 74 (1977) 3171 3175. Berk, A.J. and Sharp, P.A.: Structure of the adenovirus 2 early mRNAs. Cell 14 (1978) 695-711. Boyer, It.W. and Routland-Dussoix, D.: A complementation analysis of the restriction and modification of DNA in k'sc/leriehia coll. J. Mol. Biol. 41 (1969) 459-472. Breathnach, R., Benoist, C., O'ttare, K., Gannon, F. and Chambon, P.: Ovalbumin gene: evidence for a leader sequence in mRNA and DNA sequences at tile exonintron boundaries. Proc. Natl. Acad. Sci. USA 75 (1978) 4853 4857. Chow, L.T., Gelinas, R.E., Broker, T.R. and Roberts, R.J.: An amazing sequence arrangement at the 5' ends of adenovirus 2 messenger RNA. Cell 12 (1977) 1 8. Chow, L.T. and Broker, T.R.: The spliced structures of adenovirus 2 fiber message and other late mRNAs. Cell 15 (1978) 497 510. Chow, L.T., Broker, T.R. and Lewis, J.B.: TILe complex splicing patterns of RNA from the early regions of adenovirus2. J. Mol. Biol. 134 (1979) 265-303. Chow, L.T., Lewis, J.B. and Broker, T.R.: RNA transcription and splicing at early and intermediate times after adenovirus-2 infection. Cold Spring ttarbor Symp. Quant. Biol. 44 (1980) 401 414. Dijkema, R., Dekker, B.M.M. and Van Ormondt, H.: Tim nucleotide sequence of tile transforming BgllI-tt fragment of adenovirus type 7 DNA. Gene 9 (1980) 141-156. Efstratiadis, A., Vournakis, J.N., Donis-Keller, ft., Chaconas, G., I)ougall, D.K. and Kafatos, F.C.: End-labeling of enzymatically decapped mRNA. Nucl. Acids Res. 4 (1977) 4165 4174. Engler, J.A. and Kilpatrick, B.A.: Cloning and characterization of class B adenoviruses Ad3 and Ad7 DNA. Gene 13 (1981) 125-132. Evans, R.M., Eraser, N., Ziff, E., Weber, J., Wilson, M. and Darnell, J.E.: TILe initiation sites for RNA transcription in Ad2 DNA. Cell 12 (1977) 733-739. Ferguson, J. and Davis, R.W.: An electron microscopic method for studying and mapping the region of weak sequence homology between simian virus 40 and polyoma DNAs. J. Mol. Biol. 94 (1975) 135-149. Fiers, W., Contreras, R., llaegeman, G., Rogiers, R., Van de Voorde, A., Van tleuverswyn, H., Van tlerreweghe, J., Volckaert, G. and Ysebaert, M.: Complete nucleotide sequence of SV40 DNA. Nature 273 (1978) 113-120.

Gannon, G., O'llare, K., Perrin, F., LePennec, J., Benoist, C., Cochet, M., Breathnach, R., Royal, A., Garapin, A., Cami, B. and Chambon, P.: Organisation and sequences at the 5' end of a cloned complete ovalbumin gene. Nature 278 (1979) 428 434. Garon, C.F.. Berry, K.W., Hierholzer, J.S. and Rose, ,I.A.: Mapping of base sequence heterologies between genomes from different adenovirus serotypes. Virology 54 (1973) 414-426. Ghosh, P., Reddy, V., Swinscoe, J,, Choudary, P., Lebowitz, P. and WeJssman, S.: The 5'-terminai leader sequence of late 16S mRNA from cells infected with simian virus 40. J. Biol. Chem. 253 (1978a) 3643 3647. Ghosh, P., Reddy, V., Swinscoe, J., Lebowitz, P. and Weissman, S.: The heterogeneity and 5' terminal structures of tile late RNAs of simian virus 40. J. Mol. Biol. 126 (1978b) 813 -846. Ghosh, P.K., Lebowitz, P., Frisque, R.J. and Gluzman, Y.: Identification of a promoter component involved in positioning the 5'-termini of the simian virus 40 early mRNAs. Proc. Natl. Acad. Sci. USA, 78(1981) 100 104. Gluzman, Y., Sambrook, J.l:. and Frisque, R.J.: Expression of early genes of origin-defective mutants of SV40. Proc. Natl. Acad. Sci. USA77 (1980) 3898 3902. Green, M., Mackey, J.K., Wold, W.S.M. and Rigden, P.: Thirty-one human adenovirus serotypes (Adl Ad31) form five groups ( A - E ) based upon DNA genome homologies. Virology 93 11979) 481 492. ltaegeman, G. and Eiers, W.: Localization of the 5' terminus of late SV40 mRNA. Nucl. Acids Res. 5 (1978) 2359 2371. tfagenbtictrle, O., Santer, M., Steitz, J.A. and Marts, R.J.: Conservation of the primary structure at tim 3' end of 18S rRNA from eukaryotic ceils. Cell 13 (1978) 551 563. Hardison, R.C., Butler, E.T., III, Lacy, E., Maniatis, T., Rosenthal, N. and Efstratiadis, A.: The structure and transcription of four linked rabbit 13-1ikeglobin genes. Cell 18 (1979) 1285-1297. Itowley, P.M., Israel, M.A., Law, M.-E. and Martin, M.A.: A rapid method for detecting and mapping homology between heterologous DNAs. J, Biol. Chem. 254 (1979) 4876 4883. flutton, J.R. and Wetmur, J.G.: Effect of chemical modification on the rate of renaturation of deoxyribonucleic acid. Deaininated and glyoxatated deoxyribonucleic acid. Biochemistry 12 (1973) 558-563. Kilpatrick, B.A., Gelinas, R.E., Broker, T.R. and Chow, L.T.: Comparison of late mRNA splicing among class B and class C adenoviruses. J. Virol. 30 (1979) 899 912. Kitchingman, G.R., Lai, S.-P. and Westphal, H.: Loop structures in hybrids of early RNA and the separated strands of adenovirus DNA. Proc. Natl. Acad. Sci. USA 74 (1977) 4392--4395. Konkel, D.A., Tilghman, S,M. and Leder, P.: The sequence of the chromosomal mouse 13-globin major gene: homologies in capping, splicing, and poly(A) sites. Cell 15 (1978) 1125-1132.

143 Laird, C.D., McConaughy, B.L. and McCarthy, B.J.: Rate of fixation of nucleotide substitutions in evolution. Nature 224 (1969) 149-154. Lerner, M.R., Boyle, J.A., Mount, S.M. and Steitz, J.A.: Are snRNPs involved in splicing': Nature 283 (1980) 2 2 0 224. Lewis, J.B. and Mathews, M.B.: Control of adenovirus gene expression: a class of immediate early products. Cell 21 (1980) 303-313. Lewis, J.B., Esche, H., Smart, J.E., Stillman, B., Harter, M.L. and Mathews, M.B.: The organization and expression of the left third of the adenoviral genome. Cold Spring Harbor Syrup. Quant. Biol. 44 (1980) 493-504. Mandel, M. and Higa, A.: Calcium dependent bacteriophage DNAinfection. J. Mol. Biol. 53 (1970) 159 162. Manley, J., Fire, A., Cano, A., Sharp, P.A. and Gefter, M.L.: DNA-dependent transcription of adenovirus genes in a soluble whole-cell extract. Proc. Natl. Acad. Sci. USA 77 (1980) 3855-3859. Maxam, A.M. and Gilbert, W.: Sequencing end-labeled DNA with base-specific chemical cleavages. In L. Grossman and K. Moldave (Eds.). Methods in Enzymology, Vol. 65, Academic Press, New York, 1980, pp. 499 560. Perricaudet, M., Akusj~rvi, G., Virtanen, A. and Pettersson, U.: Structure of two spliced mRNAs from the transforming region of human subgroup C adenoviruses. Nature 281 (1979) 694 696. Reddy, V., Thimmappaya, B., Dhar, R., Subramanian, K., Zain, S., Pan, J., Ghosh, P., Celma, M. and Weissman, S.: The genome of SV40 DNA. Science 200 (1978) 494 500. Richardson, C.C.: Phosphorylation of nucleic acid by an enzyme from T4 bacteriophage-infected Escherichia coli. Proc. Natl. Acad. Sci. USA 54 (1965) 158-165. Rogers, J. and Wall, R.: A mechanism for RNA splicing. Proc. Natl. Acad. Sci. USA 77 (1980) 1877-1879. Roberts, R.J.: Restriction and modification enzymes and their recognition sequences. Gene 8 (1980) 329-343. Sambrook, J., Sleigh, M., Engler, J.A. and Broker, T.R.: The evolution of the adenoviral genome. In P. Palese and B. Roizman (Eds.), Genetic Variation of Viruses, Ann. New York Acad. Sci. 354 (1980) 426-452. Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74 (1977) 5463-5467. Seif, I., Khoury, G. and Dhar, R.: BKV splice sequences based on analysis of preferred donor and acceptor sites. Nucl. Acids Res. 6 (1979) 3387-3398. Smiley, B.L. and Warner, R.C.: Heteroduplexes of @X174 and G4 DNAs: orientation to genetic map and comparison with predictions from nucleotide sequences. Nucl. Acids Res. 6 (1979) 1979-1991. Smith, H.O. and Birnstiel, M.L.: A simple method for DNA restriction site mapping. Nucl. Acids Res. 3 (1976) 2 1 8 7 2198. Smith, M., Leung, D.W., Gillam, S., Astell, C.R., Mont-

gomery, C.L. and Hall, B.D.: Sequence for the gene for iso-l-cytochrome c in Saccharomyces cerevisiae. Cell 16 (1979) 753 761. Soeda, E., Arrand, J.R., Smolar, N., Walsh, J.E. and Griffin, B.E.: Coding potential and regulatory sequences of the polyoma virus genome. Nature 283 (1980) 445-454. Stilhnan, B.W., Lewis, J.B., Chow, L.T., Mathews, M.B. and Smart, J.E.: Identification of the genc and messenger RNA for the adenovirus terminal protein precursor. Cell 23(1981) 497-508. Sugisaki, H., Sugimoto, K., Takanami, M., Sbiroki, K., Saito, I., Shimojo, It., Sawada, Y., Uemizu, Y., Uesugi, S. and Fujinaga, K.: Structure and gene organization in the transforming HindIII-G fragment of Adl2. Cell 20 (1980) 777-786. Sutcliffe, J.G.: Complete nucleotide sequence of the Escherichia coli plasmid pBR322. Cold Spring lIarbor Syrup. Quant. Biol. 43 (1979) 77-90. Tabak, H.F. and Flavell, R.A.: A method for tile recovery of DNA from agarose gels. Nucl. Acids Res. 5 (1978) 2321 2332. Tibbetts, C.: Physical organization of subgroup B human adenovirus genomes. J. Virol. 24 (1977) 564 579. Tonegawa, S., Maxam, A., Tizard, R., Bernard, O. and Gilbert, W.: Sequence of a mouse germ-line gene for a variable region of an immunoglobin light chain. Proc. Natl. Acad. Sci. USA 75 (1978) 1485 1489. Tsujimoto, Y. and Suzuki, Y.: Structural analysis of the fibroin gene at the 5' end and its surrounding regions. Cell 16 (1979) 425 436. Van Ormondt, H., Maat, J., De Waard, A. and Van der Eb, A.J.: The nucleotide sequence of the transforming HpaI-E fragment of adenovirus type 5 DNA. Gene 4 (1978) 309 328. Weil, P.A., Luse, D.S., Segall, J. and Roeder, R.G.: Selective and accurate initiation of transcription at the Ad2 major late promoter in a soluble system dependent on purified RNA polymerase II and DNA. Cell 18 (1979) 469-484. Wetmur, J. and Davidson, N.: The kinetics of renaturation of DNA. J. Mol. Biol. 31 (1968) 349-370. Yang, R.C.A. and Wu, R.: BK virus DNA sequence~ extent of homology with simian virus 40 DNA. Proc. Natl. Acad. Sci. USA 76 (1979) 1179 1183. Zain, S., Gingeras, T.R., Bullock, P., Wong, G. and Gelinas, R.E.: Determination and analysis of adenovirus-2 DNA sequences which may include signals for late messenger RNA processing. J. Mol. Biol. 135 (1979a) 413-433. Zain, S., Sambrook, J., Roberts, R.J., Keller, W., Fried, M. and Dunn, A.R.: Nucleotide sequence analysis of the leader segments of a cloned copy of adenovirus 2 fiber mRNA. Cell 16 (1979b) 851 861. Ziff, E.B. and Evans, R.M.: Coincidence of the promoter and capped 5' terminus of RNA from the adenovirus 2 major late transcription unit. Cell 15 (1978) 1463-1475. Communicated by A.J. van der Eb.