Gene, 16 (1981) 179-189 Elsevier/North-Holland Biomedical Press
179
The nucleotide sequence of the gene for protein IVa2 and of the 5’ leader segment of the major late mRNAs of adenovirus type 5 (RNA coordinates;
reverse transcription;
coding capacity; Maxam-Gilbert
technique;
chain terminators)
C.P. van Beveren *, J. Maat *, B.M.M.Dekker and H. van Ormondt ** Department
of Medical Biochemistry,
Sylvius Laboratories,
University of Leiden,
Wassenaarseweg 72, 2333 AL Leiden
(The Netherlands)
(Received June 18th, 1981 (Accepted September 15th, 1981)
SUMMARY
We present here the primary structure of the region of human adenovirus 5 (Ad5) DNA from nucleotide 4001 through the Hind111 site at nucleotide 6246 (map positions 0.11 to 0.17). The corresponding region in the closely related adenovirus type 2 (Ad2) encodes the spliced mRNA for viral protein IVa2 (Chow et al., 1979; Persson et al., 1979). Reverse transcription of the Ad.5 pIVa2 mRNA localized the 5’ terminus of the mRNA to approximately position 5840, and its splice coordinates to positions 5706 and 5427. From the data of Alestrijm et al. (1980) for Ad2, the 3’ end of this mRNA was inferred to be specified by Ad5 nucleotide 4060. The nucleic acid data allow us to prehict an M, of 50 873 for the IVa2 protein of Ad5, which is close to the experimentally determined value of 50 000 (Persson et al., 1979). The DNA sequence described here also includes the information for the 5’-terminal leader segment of the major late mRNAs of Ad5.
INTRODUCTION
adenoviruses has increased impressively (for reviews see Wold et al. (1978), Philipson (1979) and Tooze
Adenoviruses provide a useful model for cell transformation and oncogenesis, as well as for the control
(1980). The expression of the linear adenovirus genome is temporally regulated. The functions of the early gene products are not yet fully understood but as a result of their expression the cell becomes primed for the synthesis of progeny virus. One cluster of early genes (region El) is located at the left terminus (map positions 0.013-0.111); it is transcribed rightwards from the r strand (Fig. 1). Neighbouring, and partially overlapping, it are two genes which are expressed at an
of gene expression in eukaryotic cells. In the part decade our knowledge of the molecular biology of * Present addresses: (C.P.v.B.) The Salk Institute, La Jolla, CA 92112, U.S.A.: (J.M.) Unilever Research Laboratory, ‘Vlaardingen, The Netherlands ** To whom all correspondence should be addressed. Abbreviations: Ad, adenovirus; AMV, avian myeloblastosis virus; bp, base pairs; ddNTP, dideoxynucleoside triphosphate; kd, kilodalton; pIVa2, viral polypeptide IVa2; SDS, sodium dodecylsulfate. 0378-1119/81/0000-0000/$02.75
0 1981 Elsevier/North-Holland
intermediate polypeptides
stage of infection, namely those for viral pIX and pIVa2. The information for
Biomedical Press
180 5
Gil
-Ela&
10
2&o
Lob0
Sma I
Hind IlI
SmaI
K.F
G.E.
S N.G.
Elb
% genome length
15
LA7
6&O
base pairs
XhoI HindIU C.E
pHa2
EC
+ major late
Fig. 1. Organization of the left terminus of Ad5 DNA. In the middle line a few restriction sites have been indicated; the letters flanking these sites are the designations of the restriction fragments. The hatched rectangle denotes the region whose sequence is given in this paper. In the bottom part of the fiiure the transcription units identified in the left-terminal 17% of the Ad5 genome are indicated.
pIX is situated entirely within early region El ; however, it has its own promoter (Wilson et al., 1979; Maat et al., 1980) which controls the (rightward) transcription of the unspliced pIX mRNA (Alestrom et al., 1980). The gene for pIVa2 is transcribed leftward from the 1 strand and has been mapped in Ad2 between map positions 0.161 and 0.111 (Fig. 1). The genes for pIX and pIVa2 show an overlap of some 10 nucleotides (Alestrom et al., 1980). The cytoplasmic mRNA for pIVa2 is spliced between map positions 0.156 and 0.149 (Chow et al., 1979j. The pIVa2 gene product, a maturation protein involved in virion
the Ad5 major late promoter and the first leader segment for late mRNA, which have been mapped in the genome of the closely related Ad2 at map position 0.163 (Chow et al., 1977; Berget et al., 1977; Ziff and Evans, 1978).
assembly (Persson et al., 1979), has an M,‘of 50 000, as estimated by SDS-polyacrylamide electrophoresis. We have embarked upon the determination of the primary structure of the left terminus of the Ad5 DNA (total genome length: 36 500 bp), with the primary aim of elucidating the organization of early region El which contains the genes responsible for cell transformation (Van der Eb et al., 1980). The nucleotide sequence of the transforming region is now completed and has been described in a series of reports (Van Ormondt et al., 1978; Maat and Van Ormondt, 1979; Maat et al., 1980; Van Ormondt et al., 1980). The present paper deals with the Ad5 DNA sequence between map positions 0.11 (nucleotide 4001) and 0.17 (nucleotide 6246, Hind111 site), which should contain the information for Ad5 polypeptide IVa2 (see above). It will be shown that the determined sequence indeed harbours all the required regulatory elements and sufficient reading space to code for a 50 kd protein. In addition, we have located
The preparation and isolation of Ad5 DNA and restriction fragments, the preparation and sources of enzymes and other materials, and the methods for
MATERIALS AND METHODS
(a) DNA and enzymes
electrophoretic fractionation have been described previously (Van Ormondt et al., 1978; Maat and Van Ormondt, 1979). Endonucleases Sin1 (isoschizomer of AvaII; Lupker and Dekker, 1981) and AcyI (De Waard et al., 1978) were gifts of Ms. M. Lupker and Dr. de Waard, and AMV reverse transcriptase was a gift of Dr. Beard (St. Petersburg, FL) to P.J. van den Elsen of this laboratory. AccI was purchased from New England Biolabs (Beverly, MA), and nuclease Sl from Sigma Biochemicals (St. Louis, MO). (b) Sequencing procedures Sequence analysis followed the procedures of Maxam and Gilbert (1977; 1980), Maat and Smith (1978) and in some instances, of Sanger et al. (1977). On two occasions when the gel techniques
I
-
__-IEf Y
-FlML
K202
FM2
AlJp
jEIMLV1
lEIK3D -
Y2K2 ----L.*.*._ H5FC -
AAs MLFL
-w
tElF5ML
LEt Y -es-
UZHS
MZFL
MZ;1
F202 -
0322
YL
-Y?l
_!a.!%
--d= BL H3
MSZC
H3F2
YE&2
U7H3
--Yi.W
-- Y&AZ
YL
“12
BLH2 UIlL4H3 BZF2
-y9 - F2M3
M3F2
H3F2
--Y9
M3HL
TlF3
H3F3
F3Pl
BZF;
T2F3
dF3
MlF3
A3HL
--Ylo
Y10 --
Y6
Y6
F3P2 Y627
F3Ml
He
HB Y?
- Y7
vim ----_-
-
_FlP2
TZFl
0&
Y3Bt
BlFl
vzm*
Y3X2
P3X2
vtm
T3Dl
‘JLH7
IEl SlFl
f3h
H6
FlW
- H6
HlFl MlFi
H7 --
DlP5 -
Y5
H7
B&?
Fig. 2. Schematic representation of the sequenced tracts of DNA in the I-strand (l) and r-strand (r) of the region between map positions 0.113 @‘muI NG junction) and 0.17 (JfindIII EC junction). The sequence of nucleotides 4001-4125 was described previously (Maat et al, 1980). In the fragment names, the first letter denotes the endonuclease used for the primary cleavage, the second letter the enzyme used for the secondary restriction cut to separate two end-labeled segments of the primary restriction fragments. The foBowing abbreviations were used: AAIuI; B-&f&011;D-PVUII; F#nfI; H-I%&; IWrrI; M-&&l; O-.4osII; P-&&; SSFWI; T-i’irq1; U-ASUI; %lvuI; X-J7zoI; YslpaII; Z-HoeIII. E.g., UaHs is the result of HhnI cleavage of AsuI fragment Ua; it is that portion of Ua that overlaps with Hs .The fragments reading across the SmuI site at map position 0.113, e.g. (E) FSM4 were obtained by cleavage of HindIKE. Fragments indicated with a dot were sequenced using &tin terminators. The lines at the bottom of the figure show those portions of the I- andr-strands that have been sequenced.
f
1
ViK3
Wil~5
4090 4100 CTTGCTGTCT TTATTTAGGG GAACGACAGA AATAAATCCC term pIVa2 4170 4180 4190 4200 TTTTCCAGGA CGTGGTAAAG GTGACTCTGG ATGTTCAGAT AAAAGGTCCT GCACCATTTC CACTGAGACC TACAAGTCTA
4070$f2;;;'4080 GATTTGGATC AAGCAAGTGT CTAAACCTAG TTCGTTCACA
5210 5220 5230 5240 5250 5260 5270 5280 5290 5300 GGCATCTCGA TCCAGCATAT CTCCTCGTTT CGCGGGTTGG GGCGGCTTTC GCTGTACGGC AGTAGTCGGT GCTCGTCCAG ACGGGCCAGG GTCATGTCTT CCGTAGAGCT AGGTCGTATA GAGGAGCAAA GCGCCCAACC CCGCCGAAAG CGACATGCCG TCATCAGCCA CGAGCAGGTC TGCCCGGTCC CAGTACAGAA
5110 5120 5130 5140 5150 5160 5170 5180 5190 5200 CAAAGTTTTT CAACGGTTTG. AGACCGTCCG CCGTAGGCAT GCTTTTGAGC GTTTGACCAA GCAGTTCCAG GCGGTCCCAC AGCTCGGTCA CCTGCTCTAC GTTTCAAAAA GTTGCCAAAC TCTGGCAGGC GGCATCCGTA CGAAAACTCG CAAACTGGTT CGTCAAGGTC CGCCAGGGTG TCGAGCCAGT GGACGAGATG
5010 5020 5030 5040 5050 5060 5070 5080 5090 5100 GGGGCCACTT CGTTAAGCAT GTCCCTGACT CGCATGTTTT CCCTGACCAA ATCCGCCAGA AGGCGCTCGC CGCCCAGCGA TAGCAGTTCT TGCAAGGAAG CCCCGGTGAA GCAATTCGTA CAGGGACTGA GCGTACAAAA GGGACTGGTT TAGGCGGTCT TCCGCGAGCG GCGGGTCGCT ATCGTCAAGA ACGTTCCTTC
4910 4920 4930 4940 4950 4960 4970 4980 4990 5000 CAGCTGCGAC TTACCGCAGC CGGTGGGCCC GTAAATCACA CCTATTACCG GGTGCAACTG GTAGTTAAGA GAGCTGCAGC TGCCGTCATC CCTGAGCAGG GTCGACGCTG AATGGCGTCG GCCACCCGGG CATTTAGTGT GGATAATGGC CCACGTTGAC CATCAATTCT CTCGACGTCG ACGGCAGTAG GGACTCGTCC
4810 4820 4830 4840 4850 4860 4870 4880 4890 4900 CTTTGAGTTC AGATGGGGGG ATCATGTCTA CCTGCGGGGC GATGAAGAAA ACGGTTTCCG GGGTAGGGGA GATCAGCTGG GAAGAAAGCA GGTTCCTGAG GAAACTCAAG TCTACCCCCC TAGTACAGAT GGACGCCCCG CTACTTCTTT TGCCAAAGGC CCCATCCCCT CTAGTCGACC CTTCTTTCGT CCAAGGACTC
4710 4720 4730 4740 4750 4760 4770 4780 4790 4800 GCCATTTTTA CAAAGCGCGG GCGGAGGGTG CCAGACTGCG GTATAATGGT TCCATCCGGC CCAGGGGCGT AGTTACCCTC ACAGATTTGC ATTTCCCACG CGGTAAAAAT GTTTCGCGCC CGCCTCCCAC GGTCTGACGC CATATTACCA AGGTAGGCCG GGTCCCCGCA TCAATGGGAG TGTCTAAACG TAAAGGGTGC
4610 4620 4630 4640 4650 4660 4670 4680 4690 4700 GTCCATAATG ATGGCAATGG GCCCACGGGC GGCGGCCTGG GCGAAGATAT TTCTGGGATC ACTAACGTCA TAGTTGTGTT CCAGGATGAG ATCGTCATAG CAGGTATTAC TACCGTTACC CGGGTGCCCG CCGCCGGACC CGCTTCTATA AAGACCCTAG TGATTGCAGT ATCAACACAA GGTCCTACTC TAGCAGTATC
4510 4520 4530 4540 4550 4560 4570 4580 4590 4600 TGTATCCGGT GCACTTGGGA AATTTGTCAT GTAGCTTAGA AGGAAATGCG TGGAAGAACT TGGAGACGCC CTTGTGACCT CCAAGATTTT CCATGCATTC ACATAGGCCA CGTGAACCCT TTAAACAGTA CATCGAATCT TCCTTTACGC ACCTTCTTGA ACCTCTGCGG GAACACTGGA GGTTCTAAAA GGTACGTAAG
4410 4420 4430 4440 4450 4460 4470 4480 4490 4500 CGTGGGGATA TGAGATGCAT CTTGGACTGT ATTTTTAGGT TGGCTATGTT CCCAGCCATA TCCCTCCGGG GATTCATGTT GTGCAGAACC ACCAGCACAG GCACCCCTAT ACTCTACGTA GAACCTGACA TAAAAATCCA ACCGATACAA GGGTCGGTAT AGGGAGGCCC CTAAGTACAA CACGTCTTGG TGGTCGTGTC
4310 4320 4330 4340 4350 4360 4370 4380 4390 4400 GGCGTGGTGC CTAAAAATGT CTTTCAGTAG CAAGCTGATT GCCAGGGGCA GGCCCTTGGT GTAAGTGTTT ACAAAGCGGT TAAGCTGGGA TGGGTGCATA CCGCACCACG GATTTTTACA GAAAGTCATC GTTCGACTAA CGGTCCCCGT CCGGGAACCA CATTCACAAA TGTTTCGCCA ATTCGACCCT'ACCCACGTAT
4210 4220 4230 4240 4250 4260 4270 4280 4290 4300 ACATGGGCAT AAGCCCGTCT CTGGGGTGGA GGTAGCACCA CTGCAGAGCT TCATGCTGCG GGGTGGTGTT GTAGATGATC CAGTCGTAGC AGGAGCGCTG TGTACCCGTA TTCGGGCAGA GACCCCACCT CCATCGTGGT GACGTCTCGA AGTACGACGC CCCACCACAA CATCTACTAG GTCAGCATCG TCCTCGCGAC
4050 4060 AAAAAACCAG ACTCTGTTTG TTTTTTGGTC TGAGACAAAC IVd,poly (AI cl 4110 4120 4130 4140 4150 4160 GTTTTGCGCG CGCGGTAGGC CCGGGACCAG CGGTCTCGGT CGTTGAGGGT CCTGTGTATT CAAAACGCGC GCGCCATCCG GGCCCTGGTC GCCAGAGCCA GCAACTCCCA GGACACATAA
4010 4020 4030temprx4040 5' GAAGGCTTCC TCCCCTCCCA ATGCGGTTTAAACATAAAT 3' CTTCCGAAGG AGGGGAGGGT TACGCCAAAT TTTGTATTTA 2
5540 5550 5560 5570 5580 5590 5600 GGGCAGTGCA GACTTTTGAG GGCGTAGAGC TTGGGCGCGA GAAATACCGA TTCCGGGGAG TAGGCATCCG CCCGTCACGT CTGAAAACTC CCGCATCTCG AACCCGCGCT CTTTATGGCT AAGGCCCCTC ATCCGTAGGC
5440 5450 5460 5470 5480 5490 5500 CGCGTCGGCC AGGTAGCATT TGACCATGGT GTCATAGTCC AGCCCCTCCG CGGCGTGGCC CTTGGCGCGC GCGCAGCCGG TCCATCGTAA ACTGGTACCA CAGTATCAGG TCGGGGAGGC GCCGCACCGG GAACCGCGCG
5 980 5990 6000 TGGTTTGTAG GTGTAGGCCA CGTGACCGGG ACCAAACATC CACATCCGGT GCACTGGCCC major late, 5' splice 6080 60s50 6100 GCGAGGGCCA GCTGTTGGGG TGAGTACTCC CGCTCCCGGT CGACAACCCC ACTCATGAGG
tion unit and the Initiation and termination codons of pIVa2. The 5’ termini of the pIVa2 (Baker and Ziff, 1981; this paper) and major late (Baker and Ziff, 1980) mRNAs are also indicated, as are their splice points (see DISCUSSION). The positions where the Ad2 DNA sequence (Baker andZiff, 1980)differs from that of Ad& have been marked with arrows.
Fig. 3.Thenucleotide sequence of Ad5 DNA between positions4001 and6246(NindIXIsite).SincewehavedeletedonenucleotidefromtheprecedingDNAregion(Boseta 1981)the nucleotide numbers of the first 125residues areonelowerthanthoseofMaatetal.(1980).Intheappropriatestrandswehaveunder~edthepolyadenylations (AATAAA) andsites ofthemRNAs forpIVa2andoftheElb region (and pIX) (Alestriim et al., 1980), the “Goldberg-Hogness” box (TATAAAA) for the major late transcrip-
4 6210 6220 6230 6240 CCGCATCCAT CTGGTCAGAA AAGACAATCT TTTTGTTGTC AAGCTT GGCGTAGGTA GACCAGTCTT TTCTGTTAGA AAAACAACAG TTCGAA
1 6110 6120 6130 6140 6150 6160 6170 6180 6190 6200 CTCTGAAAAG CGGGCATGAC TTCTGCGCTA AGATTGTCAG TTTCCAAAAA CGAGGAGGAT TTGATATTCA CCTGGCCCGC GGTGATGCCT TTGAGGGTGG GAGACTTTTC GCCCGTACTG AAGACGCGAT TCTAACAGTC AAAGGTTTTT GCTCCTCCTA AACTATAAGT GGACCGGGCG CCACTACGGA AACTCCCACC
5950 5960 5970 GTCGCCCTCT TCGGCATCAA GGAAGGTGAT CAGCGGGAGA AGCCGTAGTT CCTTCCACTA r5' end, major late mRNA 6050 6060 6070 TCGTCCTCAC TCTCTTCCGC ATCGCTGTCT AGCAGGAGTG AGAGAAGGCG TAGCGACAGA
5850 5860 5870 5880 5890 5900 CAAAGGCTCG CGTCCAGGCC AGCACGAAGG AGGCTAAGTG GGAGGGGTAG CGGTCGTTGT GTTTCCGAGC GCAGGTCCGG TCGTGCTTCC TCCGATTCAC CCTCCCCATC GCCAGCAACA
5720 5730 5740 5750 5760 5770 5780 5790 5800 GTTTCCATGA GCCGGTGTCC ACGCTCGGTG ACGAAAAGGC TGTCCGTGTC CCCGTATACA GACTTGAGAG GCCTGTCCTC GAGCGGTGTT CP&G~;T CGGCCACAGG TGCGAGCCAC TGCTTTTCCG ACAGGCACAG GGGCATATGT CTGAACTCTC CGGACAGGAG CTCGCCACAA
5820 5830 5840 CCTCGTATAG AAACTCGGAC CACTCTGAGA GGAGCATATC TTTGAGCCTG GTGAGACTCT 5'eru?, pIVa2 mRNA &&I 5910 5920 5930 5940 CCACTAGGGG GTCCACTCGC TCCAGGGTGT GAAGACACAT GGTGATCCCC CAGGTGAGCG AGGTCCCACA CTTCTGTGTA Hogness 6010 6020 6030 6040 TGTTCCTGAA GGGGGGCTAT AAAAGGGGGT GGGGGCGCGT ACAAGGACTT CCCCCCGATA TTTTCCCCCA CCCCCGCGCA
5710 CTTACCTCTG GAATGGAGAC 5' spZice 1 5810 CCGCGGTCCT GGCGCCAGGA
5610 5620 5630 5640 5650 5660 5670 5680 5690 5700 CGCCGCAGGC CCCGCAGACG GTCTCGCATT CCACGAGCCA GGTGAGCTCT GGCCGTTCGG GGTCAAAAAC CAGGTTTCCC CCATGCTTTT TGATGCGTTT GCGGCGTCCG GGGCGTCTGC CAGAGCGTAA GGTGCTCGGT CCACTCGAGA CCGGCAAGCC CCAGTTTTTG GTCCAAAGGG GGTACGAAAA ACTACGCAAA
5410 5420 5430 GGTGCTGAAG CGCTGCCGGT CTTCGCCCTG CCACGACTTC GCGACGGCCA GAAGCGGGAC 3' splice I 5510 5520 5530 AGCTTGCCCT TGGAGGAGGC GCCGCACGAG TCGAACGGGA ACCTCCTCCG CGGCGTGCTC
5310 5320 5330 5340 5350 5360 5370 5380 5390 5400 TCCACGGGCG CAGGGTCCTC GTCAGCGTAG TCTGGGTCAC GGTGAAGGGG TGCGCTCCGG GCTGCGCGCT GGCCAGGGTG CGCTTGAGGC TGGTCCTGCT AGGTGCCCGC GTCCCAGGAG CAGTCGCATC AGACCCAGTG CCACTTCCCC ACGCGAGGCC CGACGCGCGA CCGGTCCCAC GCGAACTCCG ACCAGGACGA
184
ment P5 was extended by a 30-min incubation at 37OCwith AMV reverse transc~ptase (in 50 mM Tris + HCI, pH 8.3, 50 mM KCI, 5 mM MgCls, 10 mM dithioerythritol) in the presence of four dNTPs and one ddNTP; the t~phosphate concentrations were modified slightly with respect to those given by Zimmern and Kaesberg (1978). The 5’ terminus of pIVa2 mRNA was established by hybridizing a 5’.labeled primer fragment to late poly(A)’ Ad RNA. This primer was extended with AMV reverse transcriptase and dNTPs; the elongated product was run next to a Maxam-Gilbert sequence
yielded irreproducible readings we had to resort to the “wandering spot” method (Tu et al., 1976) to obtain a definitive answer. The DNA sequence data were processed with the computer programs of Staden (1977; 1978). The splice point in pIVa2 mRNA was determined by annealing the &&I fragment P5 (nucleotides 5 178-5352, which had been made single-stranded by treatment with exonuclease III (Smith, 1979) to Ad5 late poly(A)* RNA in 50% fo~~de, 50 mM Tris - HCI, pH 8.3, 1 mM EDTA, 0.6 M KC1and 0.1% SDS (3 min at 65”C, 2 h at 45°C). The primer frag-
5ooo
4500
I”
‘f’
11
1
Y-Hpa II Z-Hoe III
‘1’
13
11
1
* I
*I’
”
11
1
15
2.
111111
16
1
:
”
’
,;
%
ri ::3.
1
)
GG'CC
11
G‘GNCC G’ANTC
1 i&_&Q : !
T‘CGA
.2x
C-FnuDI[
111
A-AIu I
c
1
1
4 :
1 lllill
::
11
2x
1
Cl11
1
length
11 g_
3x
H-Hha I
genome
C‘CGG
1 .2x
left terminus
1 111
1
T-Toq I
M-Mb01
i
1
1
‘it
4 114 41 11
1111 11 2x :
1
1 1
14
”
bp from
6000
5500
i
A ill
4
II
U-AsuI F-tllnf I
12
“
11 1
11 11
GCG‘C
1
1
1
1
CG‘CG
I
i
AG ‘CT ‘GATC
B-Mb0 II
GAAGA
m
P- Hph I
GGTGA
8
R- EcoR II
1
1
11 1
1111
ll
11
1
‘cc +GG
RsaI
GT‘AC
Ode I
C‘TNAG
No sites for: AosI Cl01 HpaI
Hind I[ EcoRI
BamHI Bgl I B# P BclI
KpnI PvuI
XmollI Eco RV Sau I TteI
Sol1 Mla I Xba I
Fig. 4. Physical map for a number of restriction sites in the pIVa2 region (from Smaf site at map position 0.113 to HindIll site at 0.17) of Ad5 DNA. For M&II, HphI and HgaI, the actual cleavage sites are indicated by arrowheads, relative to each recognition site.
pattern
for a fragment that was 5’Jabeled at the same
site as the primer.
RESULTS
(a) DNA sequence By use of the sequencing techniques mentioned under MATERIALS AND METHODS we obtained a number of overlapping gel readings (schematically drawn in Fig. 2) which allowed us to deduce the primary structure reported here. More than 95% of the sequence was determined in both DNA strands (Fig. 2). In Fig. 3 the nucleotide sequence of Ad5 DNA from nucleotide 4001 to nucleotide 6246 is given in double-stranded format. The first nucleotides given here have already been described in a previous paper (Maat et al., 1980) but are shown again because they encode the 3’ terminus of the IVa2 mRNA as will be demonstrated below. In the sequence we have indicated the 5’ termini of the pIVa2 and major late mRNAs and the suspected promoter for the latter, the initiation and termination codons of polypeptide IVa2 and the splice points in its mRNA and the poly(A) addition sites and signals in the mRNAs for polypeptides IX and IVa2. Fig. 4 gives some restriction endonuclease cleavage maps determined from the primary structure. They confirm the crude maps prepared prior to sequencing the DNA. The sites for endonucleases DdeI and RsuI were deduced from the DNA sequence and have not been tested since they were not commercially available when this work was carried out. (b) mRNA sequence By annealing H,uhI fragment P4 (5178-5352) to late Ad5 mRNA and using it as a primer for chain extension with reverse transcriptase and chain terminator triphosphates, we obtained a sequence gel for a section of the mRNA that is transcribed from noncontiguous areas of the viral genome (the bands in the autoradiograph were too faint to allow photographic reproduction). The C residues in positions 259 and 260 of the cDNA correspond to nucleotides C-5427 and C-5706 in the genomic DNA, so that this
Fig. 5. Autoradiograph of a Maxam-Gilbert sequence ladder for a AccI-RsaI fragment (positions 5766-6095) Y-labeled at the AccI site (5766). Alongside the sequence lanes, the 5’labeled AccISinI fragment (5766-5805) elongated to the 5’ terminus of the pIVa2 messenger with reverse transcriptase and dNTP’s is seen to comigrate with bands A-5838, G-5837 (major bands), A-5840 and C-5841 (minor bands).
sequence gel established the donor and acceptor sites for RNA splicing in the primary transcript. Fig. 5 is the autoradiograph of a Maxam-Gilbert pattern for a portion of the AccI-RsuI fragment (positions 5766-6095) 5’-labeled at the AccI site (5766). Next to the sequence ladder were run the products obtained by primer extension of a shorter singlestranded DNA fragment labeled at the same AccI site (from AccI at 5766 to Sin1 at 5805) annealed to Ad5 late poly(A)’ RNA. The major extension product is seen to comigrate with residue A5838, while a slightly less dominant band runs with the same speed as G5837; two minor bands run with A5840 and C5841 (all nucleotides in the I strand). Because the chemical degradation sequencing procedure eliminates the detected nucleotide, DNA migrating at a position corresponding to the Nth nucleotide from the labeled terminus is actually N - 1 nucleotides in length. Thus the 5’ terminus of pIVa2 mRNA may be localised to positions 5836 to 5840.
186
DISCUSSION
(b) Coding properties
The nucleotide sequence described in this paper covers the region between map positions 0.11 and 0.17 of Ad5 DNA. In the closely related Ad2 DNA
In Fig. 6A we show how the nonsense codons present in the r strand of the determined sequence are distributed in the three possible reading frames, and
this region codes for viral protein
how they are located
1979). Because of the sequence these mately
two human
adenovirus
IVa2 (Chow et al., homology
strains
assume that our sequence
between
we can legiti-
encodes
the IVa2
protein of Ad5. (a) Coordinates of Ad5 pIVa2 mRNA
We have located the template for the 5’ terminus of pIVa2 mRNA between positions 5836 and 5840 (Fig. 5). This is in close agreement with the results of Baker and Ziff (1981) whose reported 5’-terminal nucleotides of Ad2 pIVa2 mRNA correspond to positions 5836 and 5838 in our sequence (Fig. 3). This location for the cap template came as a surprise, since at positions 5979-5974 the r strand contains the sequence TACAAA. In Ad2 the corresponding sequence reads TATAAA (Baker and Ziff, 1980) which is identical to the canonical sequence of the “Goldberg-Hogness” box (Cannon et al., 1979). Since this homologous sequence has been observed to occur about 30 nucleotides upstream from the 5’ terminals of a number of eukaryotic mRNAs we had expected the cap template for IVa2 mRNA to lie around position 5950. At positions 4095-4090 we located the sequence AATAAA usually associated with the 3’-poly(A) tail of eukaryotic messengers (Proudfoot and Brownlee, 1976). Alestrom et al. (1980) have determined the poly(A) addition site for the pIVa2 mRNA of Ad2. By extrapolating their data to the Ad5 sequence we can predict that the analogous site in the Ad5 DNA sequence should be located at nucleotide 4060. As is the case with the vast majority of the adenovirus messengers, the mRNA for pIVa2 consists of segments transcribed from non-contiguous regions of the genomic DNA. By sequencing the suspected region of this RNA with reverse transcriptase and dideoxynucleoside triphosphates, we established that the nucleotides corresponding to G-5706 and G-5427 are joined in the mature mRNA. Thus, in summary, the coordinates of pIVa2 mRNA are 5’-5836-5706splice-5427-4060-3’.
relative to the coordinates
of
the mRNA for pIVa2. The 5’ terminus of the mRNA is at position triplet
(5718)
a nonsense
5836,
and the first available
AUG
is in frame 1 which is interrupted codon
at position
by
5 199. However,
at
the splice point translation switches from frame 1 to by a nonsence codon at position 5199. However, at the splice point translation switches from frame 1 to frame 3 which permits protein synthesis up to the termination codon UAA at position 4093. Consequently, the coding information for protein IVa2 lies between AUG-5718 and UAA-4093, and should specify a polypeptide of 449 amino acids (Fig. 7) with M, = 50 873. This value is in agreement with the M, of pIVa2 (50000) reported by Persson et al. (1979). The predicted amino acid composition of pIVa2 is given in Table I. This prediction is based on the assumption that protein synthesis initiates at the 5’-proximal AUG of the mRNA as is generally the case in eukaryotic systems. The second AUG is 145 nucleotides downstream from the presumed initiator but lies in another frame. It should be noted that the stop codon UAA coincides with the single polyadeny lation signal AAUAAA (see above) for the pIVa2 messenger.
TABLE Amino
I acid composition
of pIVa2 predicted
from the nucleic
acid data The protein triplet
is assumed
to initiate
at the 5’-proximal
AUG
on the mRNA.
Amino
Number
Amino
Number
acid
residues
acid
residues
phe
11
his
19
leu
50
24
ile
21
glu asn
met
16
lys
20
of
18
val
19
asp
29
ser
24
19
Pro thr
39
glu cys
I
trp
5
ala
20 33
ax
3s
tYr
14
glY
26
of
187
A
11
1s
I v 4000
1“
13 I
”
4500
1p
I,"
1
’
I
5000
-
’
”
% genome
1,s
1
’
5500
p
”
17 i
6000
’
.’
length
basepatn
PmaJor late
1
I
III
II
I
II
I I II I 1 II I IIUI
c
GTA 5718
I
II
Ill
no ATG
I
I
Iflll
I8
ll RI Ill I Ill
I I I I III
I
III 1’ #I
Ill
I
I
I
II
I
I
I
II
I
I
II II I
II
I
IIUI
I I II I
I
I
ll Ill
Fig. 6. Suggested coding properties of the Ad5 pIVa2 region. (A) Top lines: The nonsense codons for the r-strand (tick marks) are arranged according to their reading frames. Also indicated is ATG-5718 encoding the fist available met codon in pIVa2 mRNA, and the “Goldberg-Hogness” box (“P”) for the major late transcription unit. The solid black line denotes the postulated coding sequences for polypeptide IVa2, and the hatched rectangle a potential, alternative protein in another readirq frame (115 amino acids). The bottom line gives the coordinates of pIVa2 mRNA: 5’ terminus (Baker and Ziff, 1981; this paper); splice points (this paper) and 3’ terminus (Alestrom et al., 1980). (B) Nonsense codon d~~ibution in the i-strand.
Lewis et al. (1980) have mapped two small polypeptides (17 000 and 16 500) in the IVa2 region of Ad2 DNA, but they did not indicate from which strand the mRNAs are transcribed, or whether they are at all related to pIVa2. As indicated in Fig. 6A, the r strand contains one other region available for protein synthesis (in the absence of RNA processing). The AUG initiation signal is located within the intron of the IVa2 messenger. The expected MI for this theoretical polypeptide would be 12 200. In addition, the 2 strand (Fig. 6B) has open frames between
TAG-4532 and TAG-5081 (frame 2 does not contain any ATG triplets in the given interval), between TAG5081 and TAG-5444 (frame 2), and between TAG5263 and TAG-5818 (frame 1). The latter two contain ATG triplets allowing the synthesis of polypeptides of 50 and 45 amino acids, respectively. (c) The first leader of the major late mRNAs Baker and Ziff (1980) have described the primary structure of the 383 bp preceding the Hind111 site at
188
10 20 30 METRGRRPAA LQHQQDQPQA HPGQRAARSA
40
50
60
PLHRDPDYAD
EDPAPVERHD
PGPSGRAPTT
70 80 90 AVQRKPPQPA KRGDMLDRDA VEQVTELWDR
100 110 120 LELLGQTLKS MPTADGLKPL KNFASLQELL
130 140 150 160 170 180 SLGGERLLAD LVRENMRVRD MLNEVAPLLR DDGSCSSLNY QLHPVIGVIY GPTGCGKSQL 190 200 210 220 230 240 LRNLLSSQLI SPTPETVFFI APQVDMIPPS ELKAWEMQIC EGNYAPGPDG TIIPQSGTLR 250 260 270 280 290 300 PRFVKMAYDD LILEHNYDVS DPRNIFAQAA ARGPIAIIMD ECMENLGGHK GVSKFFHAFP 310 320 330 340 350 360 SKLHDKFPKC TGYTVLWLH NMNPRRDMAG NIANLKIQSK MHLISPRMHP SQLNRFVNTY 370 380 390 400 410 4i0 TKGLPLATSL LLKDIFRHHA QRSCYDWIIY NTTPQHEALQ WCYLHPRDGL MPMYLNIQSH 430 440 450 LYHVLEKIHR TLNDRDRWSR AYRARKTPK* Fig. 7. Postulated amino acid sequence [one-letter code; Eur. J. Biochem. 74 (1977) l-61 of Ad5 polypeptide IVa2 as deduced from.the nucleic acid sequencing data.
map coordinate 0.17 in Ad2 DNA. This sequence is identical with the tract 5858-6240 reported in this paper, with the exception of three one-base differences at positions 5977, 6105 and 6205 (see Fig. 3). In their report, Baker and Ziff demonstrate that this stretch encodes the 5’-terminal leader segment that occurs in all major Ad2 mRNAs. Since the Ad5 sequence is virtually identical we conclude from the Ad2 data that in Ad5 DNA this leader segment is encoded by nucleotides 6049-6089. The leader segment is preceded by a “Goldberg-Hogness box” TATAAAA at positions 6018-6024.
(d) Concluding remarks Recently, Stillman et al. (1981) have demonstrated that in Ad2 the 1 strand between map positions 0.3 1.5 and 0.11 encodes the mRNA for a 87 000 protein which is the precursor for the 55 000 protein covalently bound to the 5’ termini of the viral DNA. This messenger is synthesized in the early and late stages of infection in small quantities; its coordinates indicate that it must contain all the coding information for protein IVa2. Further sequence studies on the region between map position 0.17 and 0.3 15 will
demonstrate
whether
the termination
codon for the
87000 polypeptide is situated in the sequence described in this paper, or lies farther to the right.
ACKNOWLEDGEMENTS
The authors
wish to thank Drs. A.J. van der Eb
and A. de Waard for their stimulating interest, and Ms. M. Lupker for a gift of endoR . Sinl. Drs. C. Baker, E. Ziff and J. Engler are thanked for communicating their unpublished sequence data. This work was in part supported by grants from the Foundation for Chemical Research in The Netherlands (J.M.), The Queen Wilhelmina Cancer Fund (J.M., C.P.v.B.) and the Leiden University Fund (C.P.v.B.).
REFERENCES Alestrijm, P., Akusjtii, G., Perricaudet, M., Mathews, M.B., Klessig, D.F. and Pettersson, U.: Structure of the gene
189
for polypeptide IX of adenovirus type 2 and its unspliced mRNA. Cell 19 (1980) 671-681. Baker, CC. and Ziff, E.B.: Biogenesis, structures and sites of encoding of the 5’ termini of adenovirus-2 mRNAs. Cold Spring Harbor Symp. Quant. Biof. 44 (1980) 415-428. Baker, C.C. and Ziff, E.B.: Promoters and heterogeneous 5’ termini of the messenger RNAs of adenovtius-2. J. Mol. Biol. 149 (1981) 189-221. Berget, S.M., Moore, C. and Sharp, P.A.: Spliced segments at the 5’ terminus of adenovirus 2 late mRNA. Proc. Natl. Acad. Sci. USA 74 (1977) 3171-3175. Bos, J.L., Polder, L.J., Bernards, R., Schrier, P.I., Van den Elsen, P.J., Van der Eb, A.J. and Van Ormondt, H.: The 2.2 kb Elb mRNA of human Ad12 and Ad.5 codes for two tumor antigens starting at different AUG triplets. Cell 27 (1981) 121-131. Chow, L.T., Broker, T.R. and Lewis, J.B.: The complex splicing patterns of RNA from the early regions of Ad2. J. determined by electron microscopy of RNA-DNA hybrids. Cell ll(l977) 819-836. Chow, L.T., Broker, T.R. and Lewis, J.B.: The complex split ing patterns of RNA from the early regions of Ad2. J. Mol. Biol. 134 (1979) 265-303. Gannon, F., O’Hare, K., Perrin, F., Le Pennec, J.P., Benoist, C., Cachet, M., Breathnach, R., Royal, A., Garapin, A., Cami, B. and Chambon, P.: Organization and sequences at at the 5’ end of a cloned complete ovalbumin gene. Nature 278 (1979) 428-434. Lewis, J.B., Esche, H., Smart, J.E., Stillman, B.W., Harter, M.L. and Mathews, M.B.: Organization and expression of the left third of the genome of adenorirus. Cold Spring Harbor Symp. Quant. Biol. 44 (1980) 493-508. Lupker, H.S.C. and Dekker, B.M.M.: Purification of the sequence-specific endonuclease Sin1 from Saimotlella infanris. Biochim. Biophys. Acta 654 (1981) 297-299. Maat, J. and Smith, A.J.H.: A method for sequencing restriction fragments with dideoxynucleoside triphosphates. Nucl. Acids Res. 5 (1978) 4.537-4546. Maat, J. and Van Ormondt, H.: The nucleotide sequence of the transforming HindII1-G fragment of adenovirus type 5 DNA. Gene 6 (1979) 75-90. Maat, J., Van Bevcren, C.P. and Van Ormondt, H.: The nucleotide sequence of adenovirus type 5 early region El: the region between map positions 8.0 (Hind111 site) and 11.8 @ma1 site). Gene 10 (1980) 27-38. Maxam, A.M. and Gilbert, W.: A new method for sequencing DNA. Proc. Natl. Acad. Sci. USA 74 (1977) 560-564. Maxam, A.M. and Gilbert, W.: Sequencing end-labeled DNA with base-specific chemical cleavages, in Grossman, L. and Moldave, K. (Eds.), Methods in Enzymology, Vol. 65. Academic Press, New York, 1980, pp. 499-560. Proudfoot, N.J. and Brownlee, G.G.: 3’ Non-coding region sequences in eucaryotic mRNA. Nature 263 (1976) 211214.
Persson, H., Mathisen, B., Philipson, L. and Pettersson, U.: A maturation protein in adenovirus morphogenesis. Virology 93 (1979) 198-208. Philipson, L.: Adenovirus proteins and their messenger RNAs. Adv. Virus Res. 25 (1979) 357-405. Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74 (1977) 5463-5467. Smith, A.J.H.: The use of exonuclease III for preparing sidle-stranded DNA for use as a template in the chain terminator sequencing method. Nucl. Acids Res. 6 (1979) 831-848. Stillman, B.W., Lewis, J.B., Chow, L.T., Mathews, M.B. and Smart, J.E.: Identification of the gene and mRNA for the adenovirus terminal protein precursor. Cell 23 f 1981) 497-508. Tooze, J.: The Molecular Biology of Tumor Viruses. Part 2. DNA Tumor Viruses, 2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1980. Tu, C.D., Jay, E., Bahl, C.P. and Wu, R.: A reliable mapping method for sequence determination of oligodeoxyribonucleotides by mobility shift analysis. Anal. Biochem. 74 (1976) 73-93. Van der Eb, A.J., Van Ormondt, H., Schrier, P.I., Lupker, J.H., Jochemsen, H., Van den Elsen, P.J., DeLeys, R.J., Maat, J., Van Beveren, C.P. and De Waard, A.: Structure and function of the transforming genes of human adenoviruses and SV40. Cold Spring Harbor Symp. Quant. Biol. 44 (1980) 383-399. Van Ormondt, H., Maat, J., De Waard, A. and Van der Eb, A.J.: The nucleotide sequence of the transforming NpaI-E fragment of adenovirus type 5 DNA. Gene 4 (1978) 309328. Van Ormondt, H., Maat, J. and Van Beveren, C.P.: The nucleotide sequence of the transoms early region El of adenovirus type 5 DNA. Gene ll(l980) 299-309. Wilson, M.C., Fraser, N.W. and Darnell Jr., J.E.: Mapping of the RNA initiation sites by high doses of UV irradiation: evidence for 3 independent promoters within the left 11% of the Ad2 genome. Virology 94 (1979) 175- 184. Wold, W.S.M., Green, M. and Biittner, W.: Adenoviruses, in Nayak, D.P. (Ed.), The Molecuiar Biology of Animal Viruses. Dekker, New York, 1978, pp. 673-768; 891898. Ziff, E.B. and Evans, R.M.: Coincidence of the promoter and capped 5’ terminus of RNA from the adenovirus 2 major late transcription unit. Cell 15 (1978) 1463- 1475. Zimmern, D. and Kaesberg, P.: 3’-terminal nucleotide sequence of encephalomyocarditis virus RNA determined by reverse transcriptase and chain-terminating inhibitors. Proc. Natl. Acad. Sci. LJSA 75 (1978) 42.57-4261. Communicated
by W. Fiers.