The nucleotide sequence of the gene for protein IVa2 and of the 5' leader segment of the major late mRNAs of adenovirus type 5

The nucleotide sequence of the gene for protein IVa2 and of the 5' leader segment of the major late mRNAs of adenovirus type 5

Gene, 16 (1981) 179-189 Elsevier/North-Holland Biomedical Press 179 The nucleotide sequence of the gene for protein IVa2 and of the 5’ leader segmen...

1MB Sizes 2 Downloads 70 Views

Gene, 16 (1981) 179-189 Elsevier/North-Holland Biomedical Press

179

The nucleotide sequence of the gene for protein IVa2 and of the 5’ leader segment of the major late mRNAs of adenovirus type 5 (RNA coordinates;

reverse transcription;

coding capacity; Maxam-Gilbert

technique;

chain terminators)

C.P. van Beveren *, J. Maat *, B.M.M.Dekker and H. van Ormondt ** Department

of Medical Biochemistry,

Sylvius Laboratories,

University of Leiden,

Wassenaarseweg 72, 2333 AL Leiden

(The Netherlands)

(Received June 18th, 1981 (Accepted September 15th, 1981)

SUMMARY

We present here the primary structure of the region of human adenovirus 5 (Ad5) DNA from nucleotide 4001 through the Hind111 site at nucleotide 6246 (map positions 0.11 to 0.17). The corresponding region in the closely related adenovirus type 2 (Ad2) encodes the spliced mRNA for viral protein IVa2 (Chow et al., 1979; Persson et al., 1979). Reverse transcription of the Ad.5 pIVa2 mRNA localized the 5’ terminus of the mRNA to approximately position 5840, and its splice coordinates to positions 5706 and 5427. From the data of Alestrijm et al. (1980) for Ad2, the 3’ end of this mRNA was inferred to be specified by Ad5 nucleotide 4060. The nucleic acid data allow us to prehict an M, of 50 873 for the IVa2 protein of Ad5, which is close to the experimentally determined value of 50 000 (Persson et al., 1979). The DNA sequence described here also includes the information for the 5’-terminal leader segment of the major late mRNAs of Ad5.

INTRODUCTION

adenoviruses has increased impressively (for reviews see Wold et al. (1978), Philipson (1979) and Tooze

Adenoviruses provide a useful model for cell transformation and oncogenesis, as well as for the control

(1980). The expression of the linear adenovirus genome is temporally regulated. The functions of the early gene products are not yet fully understood but as a result of their expression the cell becomes primed for the synthesis of progeny virus. One cluster of early genes (region El) is located at the left terminus (map positions 0.013-0.111); it is transcribed rightwards from the r strand (Fig. 1). Neighbouring, and partially overlapping, it are two genes which are expressed at an

of gene expression in eukaryotic cells. In the part decade our knowledge of the molecular biology of * Present addresses: (C.P.v.B.) The Salk Institute, La Jolla, CA 92112, U.S.A.: (J.M.) Unilever Research Laboratory, ‘Vlaardingen, The Netherlands ** To whom all correspondence should be addressed. Abbreviations: Ad, adenovirus; AMV, avian myeloblastosis virus; bp, base pairs; ddNTP, dideoxynucleoside triphosphate; kd, kilodalton; pIVa2, viral polypeptide IVa2; SDS, sodium dodecylsulfate. 0378-1119/81/0000-0000/$02.75

0 1981 Elsevier/North-Holland

intermediate polypeptides

stage of infection, namely those for viral pIX and pIVa2. The information for

Biomedical Press

180 5

Gil

-Ela&

10

2&o

Lob0

Sma I

Hind IlI

SmaI

K.F

G.E.

S N.G.

Elb

% genome length

15

LA7

6&O

base pairs

XhoI HindIU C.E

pHa2

EC

+ major late

Fig. 1. Organization of the left terminus of Ad5 DNA. In the middle line a few restriction sites have been indicated; the letters flanking these sites are the designations of the restriction fragments. The hatched rectangle denotes the region whose sequence is given in this paper. In the bottom part of the fiiure the transcription units identified in the left-terminal 17% of the Ad5 genome are indicated.

pIX is situated entirely within early region El ; however, it has its own promoter (Wilson et al., 1979; Maat et al., 1980) which controls the (rightward) transcription of the unspliced pIX mRNA (Alestrom et al., 1980). The gene for pIVa2 is transcribed leftward from the 1 strand and has been mapped in Ad2 between map positions 0.161 and 0.111 (Fig. 1). The genes for pIX and pIVa2 show an overlap of some 10 nucleotides (Alestrom et al., 1980). The cytoplasmic mRNA for pIVa2 is spliced between map positions 0.156 and 0.149 (Chow et al., 1979j. The pIVa2 gene product, a maturation protein involved in virion

the Ad5 major late promoter and the first leader segment for late mRNA, which have been mapped in the genome of the closely related Ad2 at map position 0.163 (Chow et al., 1977; Berget et al., 1977; Ziff and Evans, 1978).

assembly (Persson et al., 1979), has an M,‘of 50 000, as estimated by SDS-polyacrylamide electrophoresis. We have embarked upon the determination of the primary structure of the left terminus of the Ad5 DNA (total genome length: 36 500 bp), with the primary aim of elucidating the organization of early region El which contains the genes responsible for cell transformation (Van der Eb et al., 1980). The nucleotide sequence of the transforming region is now completed and has been described in a series of reports (Van Ormondt et al., 1978; Maat and Van Ormondt, 1979; Maat et al., 1980; Van Ormondt et al., 1980). The present paper deals with the Ad5 DNA sequence between map positions 0.11 (nucleotide 4001) and 0.17 (nucleotide 6246, Hind111 site), which should contain the information for Ad5 polypeptide IVa2 (see above). It will be shown that the determined sequence indeed harbours all the required regulatory elements and sufficient reading space to code for a 50 kd protein. In addition, we have located

The preparation and isolation of Ad5 DNA and restriction fragments, the preparation and sources of enzymes and other materials, and the methods for

MATERIALS AND METHODS

(a) DNA and enzymes

electrophoretic fractionation have been described previously (Van Ormondt et al., 1978; Maat and Van Ormondt, 1979). Endonucleases Sin1 (isoschizomer of AvaII; Lupker and Dekker, 1981) and AcyI (De Waard et al., 1978) were gifts of Ms. M. Lupker and Dr. de Waard, and AMV reverse transcriptase was a gift of Dr. Beard (St. Petersburg, FL) to P.J. van den Elsen of this laboratory. AccI was purchased from New England Biolabs (Beverly, MA), and nuclease Sl from Sigma Biochemicals (St. Louis, MO). (b) Sequencing procedures Sequence analysis followed the procedures of Maxam and Gilbert (1977; 1980), Maat and Smith (1978) and in some instances, of Sanger et al. (1977). On two occasions when the gel techniques

I

-

__-IEf Y

-FlML

K202

FM2

AlJp

jEIMLV1

lEIK3D -

Y2K2 ----L.*.*._ H5FC -

AAs MLFL

-w

tElF5ML

LEt Y -es-

UZHS

MZFL

MZ;1

F202 -

0322

YL

-Y?l

_!a.!%

--d= BL H3

MSZC

H3F2

YE&2

U7H3

--Yi.W

-- Y&AZ

YL

“12

BLH2 UIlL4H3 BZF2

-y9 - F2M3

M3F2

H3F2

--Y9

M3HL

TlF3

H3F3

F3Pl

BZF;

T2F3

dF3

MlF3

A3HL

--Ylo

Y10 --

Y6

Y6

F3P2 Y627

F3Ml

He

HB Y?

- Y7

vim ----_-

-

_FlP2

TZFl

0&

Y3Bt

BlFl

vzm*

Y3X2

P3X2

vtm

T3Dl

‘JLH7

IEl SlFl

f3h

H6

FlW

- H6

HlFl MlFi

H7 --

DlP5 -

Y5

H7

B&?

Fig. 2. Schematic representation of the sequenced tracts of DNA in the I-strand (l) and r-strand (r) of the region between map positions 0.113 @‘muI NG junction) and 0.17 (JfindIII EC junction). The sequence of nucleotides 4001-4125 was described previously (Maat et al, 1980). In the fragment names, the first letter denotes the endonuclease used for the primary cleavage, the second letter the enzyme used for the secondary restriction cut to separate two end-labeled segments of the primary restriction fragments. The foBowing abbreviations were used: AAIuI; B-&f&011;D-PVUII; F#nfI; H-I%&; IWrrI; M-&&l; O-.4osII; P-&&; SSFWI; T-i’irq1; U-ASUI; %lvuI; X-J7zoI; YslpaII; Z-HoeIII. E.g., UaHs is the result of HhnI cleavage of AsuI fragment Ua; it is that portion of Ua that overlaps with Hs .The fragments reading across the SmuI site at map position 0.113, e.g. (E) FSM4 were obtained by cleavage of HindIKE. Fragments indicated with a dot were sequenced using &tin terminators. The lines at the bottom of the figure show those portions of the I- andr-strands that have been sequenced.

f

1

ViK3

Wil~5

4090 4100 CTTGCTGTCT TTATTTAGGG GAACGACAGA AATAAATCCC term pIVa2 4170 4180 4190 4200 TTTTCCAGGA CGTGGTAAAG GTGACTCTGG ATGTTCAGAT AAAAGGTCCT GCACCATTTC CACTGAGACC TACAAGTCTA

4070$f2;;;'4080 GATTTGGATC AAGCAAGTGT CTAAACCTAG TTCGTTCACA

5210 5220 5230 5240 5250 5260 5270 5280 5290 5300 GGCATCTCGA TCCAGCATAT CTCCTCGTTT CGCGGGTTGG GGCGGCTTTC GCTGTACGGC AGTAGTCGGT GCTCGTCCAG ACGGGCCAGG GTCATGTCTT CCGTAGAGCT AGGTCGTATA GAGGAGCAAA GCGCCCAACC CCGCCGAAAG CGACATGCCG TCATCAGCCA CGAGCAGGTC TGCCCGGTCC CAGTACAGAA

5110 5120 5130 5140 5150 5160 5170 5180 5190 5200 CAAAGTTTTT CAACGGTTTG. AGACCGTCCG CCGTAGGCAT GCTTTTGAGC GTTTGACCAA GCAGTTCCAG GCGGTCCCAC AGCTCGGTCA CCTGCTCTAC GTTTCAAAAA GTTGCCAAAC TCTGGCAGGC GGCATCCGTA CGAAAACTCG CAAACTGGTT CGTCAAGGTC CGCCAGGGTG TCGAGCCAGT GGACGAGATG

5010 5020 5030 5040 5050 5060 5070 5080 5090 5100 GGGGCCACTT CGTTAAGCAT GTCCCTGACT CGCATGTTTT CCCTGACCAA ATCCGCCAGA AGGCGCTCGC CGCCCAGCGA TAGCAGTTCT TGCAAGGAAG CCCCGGTGAA GCAATTCGTA CAGGGACTGA GCGTACAAAA GGGACTGGTT TAGGCGGTCT TCCGCGAGCG GCGGGTCGCT ATCGTCAAGA ACGTTCCTTC

4910 4920 4930 4940 4950 4960 4970 4980 4990 5000 CAGCTGCGAC TTACCGCAGC CGGTGGGCCC GTAAATCACA CCTATTACCG GGTGCAACTG GTAGTTAAGA GAGCTGCAGC TGCCGTCATC CCTGAGCAGG GTCGACGCTG AATGGCGTCG GCCACCCGGG CATTTAGTGT GGATAATGGC CCACGTTGAC CATCAATTCT CTCGACGTCG ACGGCAGTAG GGACTCGTCC

4810 4820 4830 4840 4850 4860 4870 4880 4890 4900 CTTTGAGTTC AGATGGGGGG ATCATGTCTA CCTGCGGGGC GATGAAGAAA ACGGTTTCCG GGGTAGGGGA GATCAGCTGG GAAGAAAGCA GGTTCCTGAG GAAACTCAAG TCTACCCCCC TAGTACAGAT GGACGCCCCG CTACTTCTTT TGCCAAAGGC CCCATCCCCT CTAGTCGACC CTTCTTTCGT CCAAGGACTC

4710 4720 4730 4740 4750 4760 4770 4780 4790 4800 GCCATTTTTA CAAAGCGCGG GCGGAGGGTG CCAGACTGCG GTATAATGGT TCCATCCGGC CCAGGGGCGT AGTTACCCTC ACAGATTTGC ATTTCCCACG CGGTAAAAAT GTTTCGCGCC CGCCTCCCAC GGTCTGACGC CATATTACCA AGGTAGGCCG GGTCCCCGCA TCAATGGGAG TGTCTAAACG TAAAGGGTGC

4610 4620 4630 4640 4650 4660 4670 4680 4690 4700 GTCCATAATG ATGGCAATGG GCCCACGGGC GGCGGCCTGG GCGAAGATAT TTCTGGGATC ACTAACGTCA TAGTTGTGTT CCAGGATGAG ATCGTCATAG CAGGTATTAC TACCGTTACC CGGGTGCCCG CCGCCGGACC CGCTTCTATA AAGACCCTAG TGATTGCAGT ATCAACACAA GGTCCTACTC TAGCAGTATC

4510 4520 4530 4540 4550 4560 4570 4580 4590 4600 TGTATCCGGT GCACTTGGGA AATTTGTCAT GTAGCTTAGA AGGAAATGCG TGGAAGAACT TGGAGACGCC CTTGTGACCT CCAAGATTTT CCATGCATTC ACATAGGCCA CGTGAACCCT TTAAACAGTA CATCGAATCT TCCTTTACGC ACCTTCTTGA ACCTCTGCGG GAACACTGGA GGTTCTAAAA GGTACGTAAG

4410 4420 4430 4440 4450 4460 4470 4480 4490 4500 CGTGGGGATA TGAGATGCAT CTTGGACTGT ATTTTTAGGT TGGCTATGTT CCCAGCCATA TCCCTCCGGG GATTCATGTT GTGCAGAACC ACCAGCACAG GCACCCCTAT ACTCTACGTA GAACCTGACA TAAAAATCCA ACCGATACAA GGGTCGGTAT AGGGAGGCCC CTAAGTACAA CACGTCTTGG TGGTCGTGTC

4310 4320 4330 4340 4350 4360 4370 4380 4390 4400 GGCGTGGTGC CTAAAAATGT CTTTCAGTAG CAAGCTGATT GCCAGGGGCA GGCCCTTGGT GTAAGTGTTT ACAAAGCGGT TAAGCTGGGA TGGGTGCATA CCGCACCACG GATTTTTACA GAAAGTCATC GTTCGACTAA CGGTCCCCGT CCGGGAACCA CATTCACAAA TGTTTCGCCA ATTCGACCCT'ACCCACGTAT

4210 4220 4230 4240 4250 4260 4270 4280 4290 4300 ACATGGGCAT AAGCCCGTCT CTGGGGTGGA GGTAGCACCA CTGCAGAGCT TCATGCTGCG GGGTGGTGTT GTAGATGATC CAGTCGTAGC AGGAGCGCTG TGTACCCGTA TTCGGGCAGA GACCCCACCT CCATCGTGGT GACGTCTCGA AGTACGACGC CCCACCACAA CATCTACTAG GTCAGCATCG TCCTCGCGAC

4050 4060 AAAAAACCAG ACTCTGTTTG TTTTTTGGTC TGAGACAAAC IVd,poly (AI cl 4110 4120 4130 4140 4150 4160 GTTTTGCGCG CGCGGTAGGC CCGGGACCAG CGGTCTCGGT CGTTGAGGGT CCTGTGTATT CAAAACGCGC GCGCCATCCG GGCCCTGGTC GCCAGAGCCA GCAACTCCCA GGACACATAA

4010 4020 4030temprx4040 5' GAAGGCTTCC TCCCCTCCCA ATGCGGTTTAAACATAAAT 3' CTTCCGAAGG AGGGGAGGGT TACGCCAAAT TTTGTATTTA 2

5540 5550 5560 5570 5580 5590 5600 GGGCAGTGCA GACTTTTGAG GGCGTAGAGC TTGGGCGCGA GAAATACCGA TTCCGGGGAG TAGGCATCCG CCCGTCACGT CTGAAAACTC CCGCATCTCG AACCCGCGCT CTTTATGGCT AAGGCCCCTC ATCCGTAGGC

5440 5450 5460 5470 5480 5490 5500 CGCGTCGGCC AGGTAGCATT TGACCATGGT GTCATAGTCC AGCCCCTCCG CGGCGTGGCC CTTGGCGCGC GCGCAGCCGG TCCATCGTAA ACTGGTACCA CAGTATCAGG TCGGGGAGGC GCCGCACCGG GAACCGCGCG

5 980 5990 6000 TGGTTTGTAG GTGTAGGCCA CGTGACCGGG ACCAAACATC CACATCCGGT GCACTGGCCC major late, 5' splice 6080 60s50 6100 GCGAGGGCCA GCTGTTGGGG TGAGTACTCC CGCTCCCGGT CGACAACCCC ACTCATGAGG

tion unit and the Initiation and termination codons of pIVa2. The 5’ termini of the pIVa2 (Baker and Ziff, 1981; this paper) and major late (Baker and Ziff, 1980) mRNAs are also indicated, as are their splice points (see DISCUSSION). The positions where the Ad2 DNA sequence (Baker andZiff, 1980)differs from that of Ad& have been marked with arrows.

Fig. 3.Thenucleotide sequence of Ad5 DNA between positions4001 and6246(NindIXIsite).SincewehavedeletedonenucleotidefromtheprecedingDNAregion(Boseta 1981)the nucleotide numbers of the first 125residues areonelowerthanthoseofMaatetal.(1980).Intheappropriatestrandswehaveunder~edthepolyadenylations (AATAAA) andsites ofthemRNAs forpIVa2andoftheElb region (and pIX) (Alestriim et al., 1980), the “Goldberg-Hogness” box (TATAAAA) for the major late transcrip-

4 6210 6220 6230 6240 CCGCATCCAT CTGGTCAGAA AAGACAATCT TTTTGTTGTC AAGCTT GGCGTAGGTA GACCAGTCTT TTCTGTTAGA AAAACAACAG TTCGAA

1 6110 6120 6130 6140 6150 6160 6170 6180 6190 6200 CTCTGAAAAG CGGGCATGAC TTCTGCGCTA AGATTGTCAG TTTCCAAAAA CGAGGAGGAT TTGATATTCA CCTGGCCCGC GGTGATGCCT TTGAGGGTGG GAGACTTTTC GCCCGTACTG AAGACGCGAT TCTAACAGTC AAAGGTTTTT GCTCCTCCTA AACTATAAGT GGACCGGGCG CCACTACGGA AACTCCCACC

5950 5960 5970 GTCGCCCTCT TCGGCATCAA GGAAGGTGAT CAGCGGGAGA AGCCGTAGTT CCTTCCACTA r5' end, major late mRNA 6050 6060 6070 TCGTCCTCAC TCTCTTCCGC ATCGCTGTCT AGCAGGAGTG AGAGAAGGCG TAGCGACAGA

5850 5860 5870 5880 5890 5900 CAAAGGCTCG CGTCCAGGCC AGCACGAAGG AGGCTAAGTG GGAGGGGTAG CGGTCGTTGT GTTTCCGAGC GCAGGTCCGG TCGTGCTTCC TCCGATTCAC CCTCCCCATC GCCAGCAACA

5720 5730 5740 5750 5760 5770 5780 5790 5800 GTTTCCATGA GCCGGTGTCC ACGCTCGGTG ACGAAAAGGC TGTCCGTGTC CCCGTATACA GACTTGAGAG GCCTGTCCTC GAGCGGTGTT CP&G~;T CGGCCACAGG TGCGAGCCAC TGCTTTTCCG ACAGGCACAG GGGCATATGT CTGAACTCTC CGGACAGGAG CTCGCCACAA

5820 5830 5840 CCTCGTATAG AAACTCGGAC CACTCTGAGA GGAGCATATC TTTGAGCCTG GTGAGACTCT 5'eru?, pIVa2 mRNA &&I 5910 5920 5930 5940 CCACTAGGGG GTCCACTCGC TCCAGGGTGT GAAGACACAT GGTGATCCCC CAGGTGAGCG AGGTCCCACA CTTCTGTGTA Hogness 6010 6020 6030 6040 TGTTCCTGAA GGGGGGCTAT AAAAGGGGGT GGGGGCGCGT ACAAGGACTT CCCCCCGATA TTTTCCCCCA CCCCCGCGCA

5710 CTTACCTCTG GAATGGAGAC 5' spZice 1 5810 CCGCGGTCCT GGCGCCAGGA

5610 5620 5630 5640 5650 5660 5670 5680 5690 5700 CGCCGCAGGC CCCGCAGACG GTCTCGCATT CCACGAGCCA GGTGAGCTCT GGCCGTTCGG GGTCAAAAAC CAGGTTTCCC CCATGCTTTT TGATGCGTTT GCGGCGTCCG GGGCGTCTGC CAGAGCGTAA GGTGCTCGGT CCACTCGAGA CCGGCAAGCC CCAGTTTTTG GTCCAAAGGG GGTACGAAAA ACTACGCAAA

5410 5420 5430 GGTGCTGAAG CGCTGCCGGT CTTCGCCCTG CCACGACTTC GCGACGGCCA GAAGCGGGAC 3' splice I 5510 5520 5530 AGCTTGCCCT TGGAGGAGGC GCCGCACGAG TCGAACGGGA ACCTCCTCCG CGGCGTGCTC

5310 5320 5330 5340 5350 5360 5370 5380 5390 5400 TCCACGGGCG CAGGGTCCTC GTCAGCGTAG TCTGGGTCAC GGTGAAGGGG TGCGCTCCGG GCTGCGCGCT GGCCAGGGTG CGCTTGAGGC TGGTCCTGCT AGGTGCCCGC GTCCCAGGAG CAGTCGCATC AGACCCAGTG CCACTTCCCC ACGCGAGGCC CGACGCGCGA CCGGTCCCAC GCGAACTCCG ACCAGGACGA

184

ment P5 was extended by a 30-min incubation at 37OCwith AMV reverse transc~ptase (in 50 mM Tris + HCI, pH 8.3, 50 mM KCI, 5 mM MgCls, 10 mM dithioerythritol) in the presence of four dNTPs and one ddNTP; the t~phosphate concentrations were modified slightly with respect to those given by Zimmern and Kaesberg (1978). The 5’ terminus of pIVa2 mRNA was established by hybridizing a 5’.labeled primer fragment to late poly(A)’ Ad RNA. This primer was extended with AMV reverse transcriptase and dNTPs; the elongated product was run next to a Maxam-Gilbert sequence

yielded irreproducible readings we had to resort to the “wandering spot” method (Tu et al., 1976) to obtain a definitive answer. The DNA sequence data were processed with the computer programs of Staden (1977; 1978). The splice point in pIVa2 mRNA was determined by annealing the &&I fragment P5 (nucleotides 5 178-5352, which had been made single-stranded by treatment with exonuclease III (Smith, 1979) to Ad5 late poly(A)* RNA in 50% fo~~de, 50 mM Tris - HCI, pH 8.3, 1 mM EDTA, 0.6 M KC1and 0.1% SDS (3 min at 65”C, 2 h at 45°C). The primer frag-

5ooo

4500

I”

‘f’

11

1

Y-Hpa II Z-Hoe III

‘1’

13

11

1

* I

*I’



11

1

15

2.

111111

16

1

:





,;

%

ri ::3.

1

)

GG'CC

11

G‘GNCC G’ANTC

1 i&_&Q : !

T‘CGA

.2x

C-FnuDI[

111

A-AIu I

c

1

1

4 :

1 lllill

::

11

2x

1

Cl11

1

length

11 g_

3x

H-Hha I

genome

C‘CGG

1 .2x

left terminus

1 111

1

T-Toq I

M-Mb01

i

1

1

‘it

4 114 41 11

1111 11 2x :

1

1 1

14



bp from

6000

5500

i

A ill

4

II

U-AsuI F-tllnf I

12



11 1

11 11

GCG‘C

1

1

1

1

CG‘CG

I

i

AG ‘CT ‘GATC

B-Mb0 II

GAAGA

m

P- Hph I

GGTGA

8

R- EcoR II

1

1

11 1

1111

ll

11

1

‘cc +GG

RsaI

GT‘AC

Ode I

C‘TNAG

No sites for: AosI Cl01 HpaI

Hind I[ EcoRI

BamHI Bgl I B# P BclI

KpnI PvuI

XmollI Eco RV Sau I TteI

Sol1 Mla I Xba I

Fig. 4. Physical map for a number of restriction sites in the pIVa2 region (from Smaf site at map position 0.113 to HindIll site at 0.17) of Ad5 DNA. For M&II, HphI and HgaI, the actual cleavage sites are indicated by arrowheads, relative to each recognition site.

pattern

for a fragment that was 5’Jabeled at the same

site as the primer.

RESULTS

(a) DNA sequence By use of the sequencing techniques mentioned under MATERIALS AND METHODS we obtained a number of overlapping gel readings (schematically drawn in Fig. 2) which allowed us to deduce the primary structure reported here. More than 95% of the sequence was determined in both DNA strands (Fig. 2). In Fig. 3 the nucleotide sequence of Ad5 DNA from nucleotide 4001 to nucleotide 6246 is given in double-stranded format. The first nucleotides given here have already been described in a previous paper (Maat et al., 1980) but are shown again because they encode the 3’ terminus of the IVa2 mRNA as will be demonstrated below. In the sequence we have indicated the 5’ termini of the pIVa2 and major late mRNAs and the suspected promoter for the latter, the initiation and termination codons of polypeptide IVa2 and the splice points in its mRNA and the poly(A) addition sites and signals in the mRNAs for polypeptides IX and IVa2. Fig. 4 gives some restriction endonuclease cleavage maps determined from the primary structure. They confirm the crude maps prepared prior to sequencing the DNA. The sites for endonucleases DdeI and RsuI were deduced from the DNA sequence and have not been tested since they were not commercially available when this work was carried out. (b) mRNA sequence By annealing H,uhI fragment P4 (5178-5352) to late Ad5 mRNA and using it as a primer for chain extension with reverse transcriptase and chain terminator triphosphates, we obtained a sequence gel for a section of the mRNA that is transcribed from noncontiguous areas of the viral genome (the bands in the autoradiograph were too faint to allow photographic reproduction). The C residues in positions 259 and 260 of the cDNA correspond to nucleotides C-5427 and C-5706 in the genomic DNA, so that this

Fig. 5. Autoradiograph of a Maxam-Gilbert sequence ladder for a AccI-RsaI fragment (positions 5766-6095) Y-labeled at the AccI site (5766). Alongside the sequence lanes, the 5’labeled AccISinI fragment (5766-5805) elongated to the 5’ terminus of the pIVa2 messenger with reverse transcriptase and dNTP’s is seen to comigrate with bands A-5838, G-5837 (major bands), A-5840 and C-5841 (minor bands).

sequence gel established the donor and acceptor sites for RNA splicing in the primary transcript. Fig. 5 is the autoradiograph of a Maxam-Gilbert pattern for a portion of the AccI-RsuI fragment (positions 5766-6095) 5’-labeled at the AccI site (5766). Next to the sequence ladder were run the products obtained by primer extension of a shorter singlestranded DNA fragment labeled at the same AccI site (from AccI at 5766 to Sin1 at 5805) annealed to Ad5 late poly(A)’ RNA. The major extension product is seen to comigrate with residue A5838, while a slightly less dominant band runs with the same speed as G5837; two minor bands run with A5840 and C5841 (all nucleotides in the I strand). Because the chemical degradation sequencing procedure eliminates the detected nucleotide, DNA migrating at a position corresponding to the Nth nucleotide from the labeled terminus is actually N - 1 nucleotides in length. Thus the 5’ terminus of pIVa2 mRNA may be localised to positions 5836 to 5840.

186

DISCUSSION

(b) Coding properties

The nucleotide sequence described in this paper covers the region between map positions 0.11 and 0.17 of Ad5 DNA. In the closely related Ad2 DNA

In Fig. 6A we show how the nonsense codons present in the r strand of the determined sequence are distributed in the three possible reading frames, and

this region codes for viral protein

how they are located

1979). Because of the sequence these mately

two human

adenovirus

IVa2 (Chow et al., homology

strains

assume that our sequence

between

we can legiti-

encodes

the IVa2

protein of Ad5. (a) Coordinates of Ad5 pIVa2 mRNA

We have located the template for the 5’ terminus of pIVa2 mRNA between positions 5836 and 5840 (Fig. 5). This is in close agreement with the results of Baker and Ziff (1981) whose reported 5’-terminal nucleotides of Ad2 pIVa2 mRNA correspond to positions 5836 and 5838 in our sequence (Fig. 3). This location for the cap template came as a surprise, since at positions 5979-5974 the r strand contains the sequence TACAAA. In Ad2 the corresponding sequence reads TATAAA (Baker and Ziff, 1980) which is identical to the canonical sequence of the “Goldberg-Hogness” box (Cannon et al., 1979). Since this homologous sequence has been observed to occur about 30 nucleotides upstream from the 5’ terminals of a number of eukaryotic mRNAs we had expected the cap template for IVa2 mRNA to lie around position 5950. At positions 4095-4090 we located the sequence AATAAA usually associated with the 3’-poly(A) tail of eukaryotic messengers (Proudfoot and Brownlee, 1976). Alestrom et al. (1980) have determined the poly(A) addition site for the pIVa2 mRNA of Ad2. By extrapolating their data to the Ad5 sequence we can predict that the analogous site in the Ad5 DNA sequence should be located at nucleotide 4060. As is the case with the vast majority of the adenovirus messengers, the mRNA for pIVa2 consists of segments transcribed from non-contiguous regions of the genomic DNA. By sequencing the suspected region of this RNA with reverse transcriptase and dideoxynucleoside triphosphates, we established that the nucleotides corresponding to G-5706 and G-5427 are joined in the mature mRNA. Thus, in summary, the coordinates of pIVa2 mRNA are 5’-5836-5706splice-5427-4060-3’.

relative to the coordinates

of

the mRNA for pIVa2. The 5’ terminus of the mRNA is at position triplet

(5718)

a nonsense

5836,

and the first available

AUG

is in frame 1 which is interrupted codon

at position

by

5 199. However,

at

the splice point translation switches from frame 1 to by a nonsence codon at position 5199. However, at the splice point translation switches from frame 1 to frame 3 which permits protein synthesis up to the termination codon UAA at position 4093. Consequently, the coding information for protein IVa2 lies between AUG-5718 and UAA-4093, and should specify a polypeptide of 449 amino acids (Fig. 7) with M, = 50 873. This value is in agreement with the M, of pIVa2 (50000) reported by Persson et al. (1979). The predicted amino acid composition of pIVa2 is given in Table I. This prediction is based on the assumption that protein synthesis initiates at the 5’-proximal AUG of the mRNA as is generally the case in eukaryotic systems. The second AUG is 145 nucleotides downstream from the presumed initiator but lies in another frame. It should be noted that the stop codon UAA coincides with the single polyadeny lation signal AAUAAA (see above) for the pIVa2 messenger.

TABLE Amino

I acid composition

of pIVa2 predicted

from the nucleic

acid data The protein triplet

is assumed

to initiate

at the 5’-proximal

AUG

on the mRNA.

Amino

Number

Amino

Number

acid

residues

acid

residues

phe

11

his

19

leu

50

24

ile

21

glu asn

met

16

lys

20

of

18

val

19

asp

29

ser

24

19

Pro thr

39

glu cys

I

trp

5

ala

20 33

ax

3s

tYr

14

glY

26

of

187

A

11

1s

I v 4000

1“

13 I



4500

1p

I,"

1



I

5000

-





% genome

1,s

1



5500

p



17 i

6000



.’

length

basepatn

PmaJor late

1

I

III

II

I

II

I I II I 1 II I IIUI

c

GTA 5718

I

II

Ill

no ATG

I

I

Iflll

I8

ll RI Ill I Ill

I I I I III

I

III 1’ #I

Ill

I

I

I

II

I

I

I

II

I

I

II II I

II

I

IIUI

I I II I

I

I

ll Ill

Fig. 6. Suggested coding properties of the Ad5 pIVa2 region. (A) Top lines: The nonsense codons for the r-strand (tick marks) are arranged according to their reading frames. Also indicated is ATG-5718 encoding the fist available met codon in pIVa2 mRNA, and the “Goldberg-Hogness” box (“P”) for the major late transcription unit. The solid black line denotes the postulated coding sequences for polypeptide IVa2, and the hatched rectangle a potential, alternative protein in another readirq frame (115 amino acids). The bottom line gives the coordinates of pIVa2 mRNA: 5’ terminus (Baker and Ziff, 1981; this paper); splice points (this paper) and 3’ terminus (Alestrom et al., 1980). (B) Nonsense codon d~~ibution in the i-strand.

Lewis et al. (1980) have mapped two small polypeptides (17 000 and 16 500) in the IVa2 region of Ad2 DNA, but they did not indicate from which strand the mRNAs are transcribed, or whether they are at all related to pIVa2. As indicated in Fig. 6A, the r strand contains one other region available for protein synthesis (in the absence of RNA processing). The AUG initiation signal is located within the intron of the IVa2 messenger. The expected MI for this theoretical polypeptide would be 12 200. In addition, the 2 strand (Fig. 6B) has open frames between

TAG-4532 and TAG-5081 (frame 2 does not contain any ATG triplets in the given interval), between TAG5081 and TAG-5444 (frame 2), and between TAG5263 and TAG-5818 (frame 1). The latter two contain ATG triplets allowing the synthesis of polypeptides of 50 and 45 amino acids, respectively. (c) The first leader of the major late mRNAs Baker and Ziff (1980) have described the primary structure of the 383 bp preceding the Hind111 site at

188

10 20 30 METRGRRPAA LQHQQDQPQA HPGQRAARSA

40

50

60

PLHRDPDYAD

EDPAPVERHD

PGPSGRAPTT

70 80 90 AVQRKPPQPA KRGDMLDRDA VEQVTELWDR

100 110 120 LELLGQTLKS MPTADGLKPL KNFASLQELL

130 140 150 160 170 180 SLGGERLLAD LVRENMRVRD MLNEVAPLLR DDGSCSSLNY QLHPVIGVIY GPTGCGKSQL 190 200 210 220 230 240 LRNLLSSQLI SPTPETVFFI APQVDMIPPS ELKAWEMQIC EGNYAPGPDG TIIPQSGTLR 250 260 270 280 290 300 PRFVKMAYDD LILEHNYDVS DPRNIFAQAA ARGPIAIIMD ECMENLGGHK GVSKFFHAFP 310 320 330 340 350 360 SKLHDKFPKC TGYTVLWLH NMNPRRDMAG NIANLKIQSK MHLISPRMHP SQLNRFVNTY 370 380 390 400 410 4i0 TKGLPLATSL LLKDIFRHHA QRSCYDWIIY NTTPQHEALQ WCYLHPRDGL MPMYLNIQSH 430 440 450 LYHVLEKIHR TLNDRDRWSR AYRARKTPK* Fig. 7. Postulated amino acid sequence [one-letter code; Eur. J. Biochem. 74 (1977) l-61 of Ad5 polypeptide IVa2 as deduced from.the nucleic acid sequencing data.

map coordinate 0.17 in Ad2 DNA. This sequence is identical with the tract 5858-6240 reported in this paper, with the exception of three one-base differences at positions 5977, 6105 and 6205 (see Fig. 3). In their report, Baker and Ziff demonstrate that this stretch encodes the 5’-terminal leader segment that occurs in all major Ad2 mRNAs. Since the Ad5 sequence is virtually identical we conclude from the Ad2 data that in Ad5 DNA this leader segment is encoded by nucleotides 6049-6089. The leader segment is preceded by a “Goldberg-Hogness box” TATAAAA at positions 6018-6024.

(d) Concluding remarks Recently, Stillman et al. (1981) have demonstrated that in Ad2 the 1 strand between map positions 0.3 1.5 and 0.11 encodes the mRNA for a 87 000 protein which is the precursor for the 55 000 protein covalently bound to the 5’ termini of the viral DNA. This messenger is synthesized in the early and late stages of infection in small quantities; its coordinates indicate that it must contain all the coding information for protein IVa2. Further sequence studies on the region between map position 0.17 and 0.3 15 will

demonstrate

whether

the termination

codon for the

87000 polypeptide is situated in the sequence described in this paper, or lies farther to the right.

ACKNOWLEDGEMENTS

The authors

wish to thank Drs. A.J. van der Eb

and A. de Waard for their stimulating interest, and Ms. M. Lupker for a gift of endoR . Sinl. Drs. C. Baker, E. Ziff and J. Engler are thanked for communicating their unpublished sequence data. This work was in part supported by grants from the Foundation for Chemical Research in The Netherlands (J.M.), The Queen Wilhelmina Cancer Fund (J.M., C.P.v.B.) and the Leiden University Fund (C.P.v.B.).

REFERENCES Alestrijm, P., Akusjtii, G., Perricaudet, M., Mathews, M.B., Klessig, D.F. and Pettersson, U.: Structure of the gene

189

for polypeptide IX of adenovirus type 2 and its unspliced mRNA. Cell 19 (1980) 671-681. Baker, CC. and Ziff, E.B.: Biogenesis, structures and sites of encoding of the 5’ termini of adenovirus-2 mRNAs. Cold Spring Harbor Symp. Quant. Biof. 44 (1980) 415-428. Baker, C.C. and Ziff, E.B.: Promoters and heterogeneous 5’ termini of the messenger RNAs of adenovtius-2. J. Mol. Biol. 149 (1981) 189-221. Berget, S.M., Moore, C. and Sharp, P.A.: Spliced segments at the 5’ terminus of adenovirus 2 late mRNA. Proc. Natl. Acad. Sci. USA 74 (1977) 3171-3175. Bos, J.L., Polder, L.J., Bernards, R., Schrier, P.I., Van den Elsen, P.J., Van der Eb, A.J. and Van Ormondt, H.: The 2.2 kb Elb mRNA of human Ad12 and Ad.5 codes for two tumor antigens starting at different AUG triplets. Cell 27 (1981) 121-131. Chow, L.T., Broker, T.R. and Lewis, J.B.: The complex splicing patterns of RNA from the early regions of Ad2. J. determined by electron microscopy of RNA-DNA hybrids. Cell ll(l977) 819-836. Chow, L.T., Broker, T.R. and Lewis, J.B.: The complex split ing patterns of RNA from the early regions of Ad2. J. Mol. Biol. 134 (1979) 265-303. Gannon, F., O’Hare, K., Perrin, F., Le Pennec, J.P., Benoist, C., Cachet, M., Breathnach, R., Royal, A., Garapin, A., Cami, B. and Chambon, P.: Organization and sequences at at the 5’ end of a cloned complete ovalbumin gene. Nature 278 (1979) 428-434. Lewis, J.B., Esche, H., Smart, J.E., Stillman, B.W., Harter, M.L. and Mathews, M.B.: Organization and expression of the left third of the genome of adenorirus. Cold Spring Harbor Symp. Quant. Biol. 44 (1980) 493-508. Lupker, H.S.C. and Dekker, B.M.M.: Purification of the sequence-specific endonuclease Sin1 from Saimotlella infanris. Biochim. Biophys. Acta 654 (1981) 297-299. Maat, J. and Smith, A.J.H.: A method for sequencing restriction fragments with dideoxynucleoside triphosphates. Nucl. Acids Res. 5 (1978) 4.537-4546. Maat, J. and Van Ormondt, H.: The nucleotide sequence of the transforming HindII1-G fragment of adenovirus type 5 DNA. Gene 6 (1979) 75-90. Maat, J., Van Bevcren, C.P. and Van Ormondt, H.: The nucleotide sequence of adenovirus type 5 early region El: the region between map positions 8.0 (Hind111 site) and 11.8 @ma1 site). Gene 10 (1980) 27-38. Maxam, A.M. and Gilbert, W.: A new method for sequencing DNA. Proc. Natl. Acad. Sci. USA 74 (1977) 560-564. Maxam, A.M. and Gilbert, W.: Sequencing end-labeled DNA with base-specific chemical cleavages, in Grossman, L. and Moldave, K. (Eds.), Methods in Enzymology, Vol. 65. Academic Press, New York, 1980, pp. 499-560. Proudfoot, N.J. and Brownlee, G.G.: 3’ Non-coding region sequences in eucaryotic mRNA. Nature 263 (1976) 211214.

Persson, H., Mathisen, B., Philipson, L. and Pettersson, U.: A maturation protein in adenovirus morphogenesis. Virology 93 (1979) 198-208. Philipson, L.: Adenovirus proteins and their messenger RNAs. Adv. Virus Res. 25 (1979) 357-405. Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74 (1977) 5463-5467. Smith, A.J.H.: The use of exonuclease III for preparing sidle-stranded DNA for use as a template in the chain terminator sequencing method. Nucl. Acids Res. 6 (1979) 831-848. Stillman, B.W., Lewis, J.B., Chow, L.T., Mathews, M.B. and Smart, J.E.: Identification of the gene and mRNA for the adenovirus terminal protein precursor. Cell 23 f 1981) 497-508. Tooze, J.: The Molecular Biology of Tumor Viruses. Part 2. DNA Tumor Viruses, 2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1980. Tu, C.D., Jay, E., Bahl, C.P. and Wu, R.: A reliable mapping method for sequence determination of oligodeoxyribonucleotides by mobility shift analysis. Anal. Biochem. 74 (1976) 73-93. Van der Eb, A.J., Van Ormondt, H., Schrier, P.I., Lupker, J.H., Jochemsen, H., Van den Elsen, P.J., DeLeys, R.J., Maat, J., Van Beveren, C.P. and De Waard, A.: Structure and function of the transforming genes of human adenoviruses and SV40. Cold Spring Harbor Symp. Quant. Biol. 44 (1980) 383-399. Van Ormondt, H., Maat, J., De Waard, A. and Van der Eb, A.J.: The nucleotide sequence of the transforming NpaI-E fragment of adenovirus type 5 DNA. Gene 4 (1978) 309328. Van Ormondt, H., Maat, J. and Van Beveren, C.P.: The nucleotide sequence of the transoms early region El of adenovirus type 5 DNA. Gene ll(l980) 299-309. Wilson, M.C., Fraser, N.W. and Darnell Jr., J.E.: Mapping of the RNA initiation sites by high doses of UV irradiation: evidence for 3 independent promoters within the left 11% of the Ad2 genome. Virology 94 (1979) 175- 184. Wold, W.S.M., Green, M. and Biittner, W.: Adenoviruses, in Nayak, D.P. (Ed.), The Molecuiar Biology of Animal Viruses. Dekker, New York, 1978, pp. 673-768; 891898. Ziff, E.B. and Evans, R.M.: Coincidence of the promoter and capped 5’ terminus of RNA from the adenovirus 2 major late transcription unit. Cell 15 (1978) 1463- 1475. Zimmern, D. and Kaesberg, P.: 3’-terminal nucleotide sequence of encephalomyocarditis virus RNA determined by reverse transcriptase and chain-terminating inhibitors. Proc. Natl. Acad. Sci. LJSA 75 (1978) 42.57-4261. Communicated

by W. Fiers.