Nucleotide sequence of E1 region of canine adenovirus type 2

Nucleotide sequence of E1 region of canine adenovirus type 2

VIROLOGY 172,460-467 (1989) Nucleotide RIRI SHIBATA,* *Department Sequence MORIKAZU of Veterinary Public Health, School Hokkaido 080; and **Depar...

773KB Sizes 0 Downloads 132 Views

VIROLOGY

172,460-467

(1989)

Nucleotide RIRI SHIBATA,* *Department

Sequence MORIKAZU

of Veterinary Public Health, School Hokkaido 080; and **Department

of El Region of Canine Adenovirus

SHINAGAWA,**’ of Veterinary of Chemistry,

Received

March

YOICHI IIDA,**

Medicine, Obihiro University Faculty of Science, Hokkeido 22,

1989; accepted

AND

Type 2

TOSHlO TSUKIYAMA*

of Agriculture University,

and Veterinary Sapporo 060,

Medicine,

Obihiro,

Japan

June 9, 1989

The nucleotide sequence of the leftmost EcoRI-C fragment (0 to 11.3%) of canine adenovirus type 2 (CAdP) which could transform rodent cells morphologically but required additional sequences from 10 to 32 map units (mu.) for full expression of its oncogenic potential was determined. The EcoRI-C fragment contains 3609 nucleotide base pairs (bp) encoding El A, El 6, and plX genes. Although the nucleotide sequence of CAd2 El shows little homology to those of human Ads, the amino acid sequences of the El proteins predicted from nucleotide sequence of CM2 El and those for human and simian Ads are partially conserved. o 1989 Academic PWSS. I~C.

INTRODUCTION

RI-C fragment encoding the complete El region transformed the cells only morphologically (Tsukiyama et a/., 1988). In this study, the nucleotide sequence of the leftmost 11.3% EcoRI-C of CAd2 was determined, and was compared with those of the other adenoviruses.

Adenoviruses (Ads) contain linear duplex DNA molecules with a molecular weight of 20-30 X 1O6 (Green and Pina, 1964; van der Eb and van Kesteren, 1966; Norrby et al., 1976). The DNA possesses a unique inverted terminal repeat (ITR) which consists of 52 to 196 nucleotide pairs (Shinagawa et a/., 1987). Within ITR region, origin of DNA replication and binding sites for at least two host nuclear proteins NF-1 (reviewed by Friefeld et a/., 1984) and NF-3 (Pruijin et a/., 1986) which are required for Ad DNA replication are located. The genes encoded in the left-end region (Early region 1, El) of the viral genome contribute to the transformation of rodent cells (reviewed by Grand, 1987). The El region contains two transcription units, El A and El B, of which gene organization are very similar among various adenovirus serotypes investigated (reviewed by Ziff, 1985). The mRNAs from the El A region mainly detected in transformed cells or in cells in early stage of the virus infection are the 12 S and the 13 S mRNAs composed of two exons with the identical 3’ splice site and different 5’ splice sites (Berk and Sharp, 1978). From the E 1 B region, the 22 S and 13 S mRNAs are mainly detected in transformed or early infected cells (Berk and Sharp, 1978; Chow et a/., 1979). The 9 S mRNA encoding protein IX (plX), one of the capsid proteins, is transcribed from the E 1B region in the late stage of infection (Pettersson and Mathews, 1977; Persson et a/., 1978; Alestrom et a/., 1980). Canine adenovirus type (CAd2) DNA and the leftmost 45% of the DNA caused oncogenic transformation of a rat cell line, 3Y1, and the leftmost 11.3% EcoSequence EMBUGenBank ’ To whom 0042~6822/89

Data from this article have been deposited with Data Libraries under Accession No. J04368. requests for reprints should be addressed. $3.00

Copynght 8 1989 by Academic Press. Inc. All rights of reproductron in any form resewed.

MATERIALS

AND

METHODS

viral DNA. The plasmid pUCEC containing leftmost 11.3% (EcoRI-C fragment) of the CAd2 genome of Tront A 26-6 1 strain (Tsukiyama et al., 1988) was used as the source of CAd2 DNA. Restriction maps of the fragment were constructed and appropriate restriction fragments were subcloned to pUC118. Nucleotide sequence analysis. Nucleotide sequencing was performed by the method of Maxam and Gilbert (1980) or the dideoxy sequencing method (Sanger et a/., 1977) using the Ml 3 sequence kit (Takara, Kyoto, Japan) or the 7-deaza sequence kit (Takara). Electrophoresis for DNA sequencing was performed using 69/o polyacryamide gel containing 7 M urea. Computer analysis of the nucleotide sequence was carried out using the programs in GENETYXR SDC (Tokyo, Japan). RESULTS

AND

DISCUSSION

The restriction maps and sequencing strategy are shown in Fig. 1. Figure 2 shows the nucleotide sequence of the cloned EcoRI-C fragment consisting of 3609 bp. Shinagawa et a/. (1987) reported the DNA sequence of the inverted terminal repeat (ITR) of CAd2 (l-l 96bp). The sequence data obtained from the cloned EcoRI-C fragment agrees with the previously published data except for an insertion of 2 nucleotides

the

460

DNA

SEQUENCE

OF CAd2

461

El 3000bp 1

o>“Yo Hi



D dill sma

I

I I

EcoRl

8ite

I

I

I

I I

I I

ACCI I

PVUll Hinf I

I

I1I!

I

I 1 I I

II

-

\_y_-

- YY

1

-

2

c-

c----c----,

--

*

-***

r(

-LL -

II

>--



-

II

*(+*

-

* *_

FIG. 1. Restriction maps and sequencing strategy of the EcoRI-C fragment. Cleavage sites are indicated by vertrcal lines. Arrows indicate the direction of sequencing using cloned restriction fragments. HindIll sites were overlapped by sequencing Hinfl clones. Cloned D&l fragments (‘). a cloned Nael fragment (**), and clones constructed by exonucleaselll and mung bean nuclease treatment (***; Henikoff, 1984) are shown.

“CG” after the 182th nucleotide, which generates a new Smal site. To confirm the sequence of this region, CAd2 DNA was analyzed directly. However, this 2-bp insertion could not be detected in the viral DNA by direct sequencing. Hence, the simplest interpretation for the origin of the 2-bp insertion in the cloned EcoRI-C is that it must have resulted from cloning a rare variant. Shinagawa er al. (1987) compared the ITRs of various Ads including CAd2, showing ATAATA (nucleotide 9-l 4) was conserved among most human and animal Ads. However, the ITR of CAd2 lacks typical consensus sequences of NF-1 (reviewed by Friefeld et a/., 1984) and NF-3 (Pruijn eta/., 1986) binding sites. From comparison of the nucleotide sequence (1-3600bp) of CAd2 with those of Ad5, Ad7, and Ad1 2 by Harr plot (data not shown), poor homology was detected between CAd2 and human Ads even in ITRs. There is, however, one relatively long conserved region in which the transcription initiation sites of ElA of human Ads are located (Fig. 4). The homology percentages of Ad5 (nucleotides 491-533) and Ad1 2 (437-479) to CAd2 (430-473) are 92.9 and 95.29/o, respectively. We have found three polyadenylated RNA molecules in CAd2-transformed cells by Northern Analysis: one was a 1 .O-kb RNA detected with a probe extended from 0 to 4.5 m.u., the others were 2.0- and 1 .l-kb RNAs with probes within 4.5 to 11.3 m.u., and they were encoded in the r-strand of CAd2 DNA (Tsukiyama et al., 1988). These RNAs were also found in CAd2-infected cells at 7 hr postinfection. This suggests that the organization of genes encoded in EcoRI-C resembles that of El genes of human Ads. To find coding regions of the RNAs in our sequence data, occurrence of initiation and termination codons

in three frames of the l-strand were examined (Fig. 3). Among several open reading frames (ORFs), five ORFs showed homology of the amino acid sequence to those of human Ads (ORFl-5). Presumed splice sites and other signal sequences including TATA box and polyadenylation signals which were deduced by comparison with the sequence of human Ad El region were indicated in Fig. 2. The presumed splice sites agreed well with the sites predicted by computer analysis (lida, 1987). Adenine at nucleotide 439 was assumed to be the cap site of the first gene from the left hand of CAd2 genome (CAd2 El A) which seems to correspond to El A of human Ads (Fig. 4). The TATA box and the polyadenylation signal of CAd2 El A are at nucleotides 409 and 1334, respectively. The sequence corresponding to the enhancer core element of human Ad ElA (Hearing and Schenk, 1986) is not clear in CAd2. The sequence GGGGCGGGGC at 1381 similar to the Spl transcription factor binding sites (Dynan and Tjian, 1983) and the TATA box at 1397 seem to construct the CAd2 El B promoter which was similar to that of Ad2 (reviewed by Berk, 1986). The TATA box of the plX promoter is at nucleotide 3185, and the polyadenylation signal of El B and plX is at 3542. In human Ads examined so far, the polyadenylation signal to IVa2 is located near that of El B on the other strand (l-strand). Therefore the polyadenylation signal on l-strand at 3549 seems to be for IVa2 of CAd2. In Northern analysis, a 1 .O-kb RNA encoded between 0 and 4.5 m.u. and 2.0 and 1 .I-kb RNAs encoded between 4.5 and 1 1.3 m.u. were detected in the transformed cells (Tsukiyama eta/., 1988). Among various combinations of predicted splice sites in Fig. 2, a few combinations, generating 0.8-, 2-, and 0.9-kb

SHIBATA ET AL. 5’-

~ATCAT~AATAATATACAGGA~AAAGAGGTGTGG~TTAAATTTGGGTGTTGCAAGGGGCGGGGTCATGGGACGGTCAGGTTCAGGT~A~G~~~TGGT~~~ GGTGTTCCCACGGGAATGTCCAGTGACGTCAAAGGCGTGGTTTTACGACAGGGcGAGTTccGcGGAcTTTTGGccGGcGccccGGGTTTTTGGGcGT~~~ TTGATTTTGCGGTTTAGCGGGTGGTGCTTTTACCACTGTTTGCGGAAGATTTAGTTGTTTATGGAGcTGGTTTTGGTGccAGTTccTccAcGGcTAA~~~ CAAAGTTTATGTCAATATAACAGAAACACTCTGTTCTCTGTTTACAGCACCCCACcCGGTGGTTTTTcGccAcGccTTTGGGTTAATTTTATTTccc~~~ A~GCGGCC~~~~~~~CT~AGTG~AGA~GAAAGAGGACT~CTCTTGAGTGCGCAGCGAGAAGAGTTTTCTCTTCGCTGTGTCTCATATATTTT~TGAA~~~ T~AAATATACTATTGTGCCGGCGCCGCGCAATCTCCATGATTATGTTTTAGAGCTACTGGAAGAGTGGcAGccGGAcTGccTTGAcTGTGAGTATTc~~~ TGGCAGCCCCTCGCCGCCTACTCTGCACGATCTTTTTGATGTTGAGCTGGAGACTTCTCACAG~~CTTTTGTGGG~~TGTGTGATTC~TGTGCGGAG~~~ GACACTGATTCGAGTGCGAGCACTGAGGCTGATTCTGGGTTTAGTCCTTTATCCACTCCGCCGGTTTCACCTATTC~AC~G~AT~~~AC~T~T~~TG~~~ GCATTTCTGACGACATGTTGCTGTGCTTAGAGGAAATGCCCACCTTTGATGACGAGGACGAGGTTCGAAGCGCGGCGACCA~CTTTGAG~GGTGGGAAAA CACTTTTGACCCCCATGTGGGTCCTATTTTTGGCTGTTTGCGCTGTGCTTTTTATCAAGAGCAGGATGATAATGCACTTTGTGGGCTTTG~TAT~TAAAG

900 I no0

GCCCTTGCCGAA~TAAGTTTTAATTTAAATGTTTGGGCAGGTTAAATGTTTGGGCAGGTTAAATGTTTTAGGTGTGTATTGATTTTTAATTTTGC~~~~ TAGTGCCTTTTGCTATGCCTGTACGTTCA~ACCCGCTTCGGCTGGAGCTGAGGAGGAAGATGATGAAGTTATTTTTGTGTCTGCCAAACCTGGGG~~::~ AAAGAGGTCAGCAGCTACTCCCTGTGAGCCAGATGGGGTCAGCAAACGCCCTTGCGTGCCAGAGCCTGAGCAAACAGAACCTTTGGATTTGTCTTTGAAG CCACGCCCGAACTAATCTCCTTGAGCACAAAGCAATAAAGTAATCTTGTTTAACAAGTTTGCCTACATTTGTGGTTTTACGGGGCGGGGCGAGGAGTATA TAATGCCAAAAGCCAGTGCCTGCTTCAT~~~~TAGACTGAGCTAAGAGCAGGTAGT~GACCCTCTTAAGATTTGTGAAAACTACCTTACTTTTA --GAGCTATAATTAGGGGAAGTACTTTGTCGCCTGGATTTTTTAGGCGGTGGTGTTTTCCTGCCTTGGCTGATGTGGTGGGCAATATAGTGGAACAGGAGGA

I3llO 1400 i 5 cici I600 1700

AGGCAGGTTTTGGCAAATTTTACCTGAAAACCACGCTTTTTGGGGTCTTTTGCGCAGGGGCTTTACTGTTGCTTCTTTTACTGAAATTATTACAGCAGCT CAGCTGGAAAATAGAGGTAGACAGTTGGCCTTTTTAGCTTTTATATCATTTTTGCTACGCAACTGGCCTTCTGACTCTGTAGTGCCTGAAGCTGACAGAC TTGACCTGGTCTGTGCGCCGGCATGGAGCAGAATGAGAT~GAGCCAGACCGCCAGGTT~TCAACGACCTCCAAGATTCCGTGCTCGAGGAGCAGGGG TCCGCGGAAGAGGAAGAGTGCGAAGAAGCGCTTTTAGCAGGGGACAGCGACGACCCATTATTCGGGTAGATGACTTGCAGCTGCCCGACCCCCTGTATGT

IA00 19nn 2ono 2100

TATGCAAGCTTTGCAACGGGACCACACTTTAGAAATGCCCAGAGGGCAGGTAGATTTTAGCTGGATTGAGGCTGAAGAGAGGCG~TAGGTCCCACAGAC

2200

GAGTGGTACTTTGAGGCTGTGAAGACTTACAAAGCTAAGCCGGGAGATGACTTGCAAACTATAATCAAAAACTATGCCAAGATTTCCTTAGAATGTGGGG CCGTGTATGAAATTAATTCTAAGATTAGGGTTACGGGGGCTTGCTACATTATTGGTAATTGTGCCGTGCTTAGGCCTAACCTGCCTGCTGGAGAAGCAAT

2300 2400

GTTTGAGGTTTTGAATGTTGATTTTATTCCTTCTATTGGTTTTATGGAAAGGATAGTGTTTTCCAATGTTATTTTTGATTGCAGGACCACCGCAACTGTA 2500 GTGTGTTGCATTAGTGAAAGAAACACCTTGTTTCACAATTGTGTTTTTTCTGGCCCTCACATGTTATGTTTGGACCTTAGGGCGGGGGCGGAGGTGAGGG 2600 GCTGTCACTTTGTGGGGGCGGTGTGTGCGTTGCGTAGCAAGGGGCTGTACAGTATTCGAGTCAAAAATAGCATTTTTGAAAAGTGTGCTTTTGGGGTGGT GACCGGGTCAAAGGCTTCTATTAGCCATTGCATGTTTAAGGATTGTACCTGCTCTATTATGCTGGGGGGTCAGGGCACTATTGCCCATAGTCAGTTTATT

27no 28130

GTAACTACTTCTGCTGAGGCCCCCATGAACCTGCAACTGTGCACTTGCGAGGGTAATGGAAGTCATGTAGTTCCATTGGGGAATATTCACTTTGCTTCTC ACCGGGAAGCTTCGTGGCCTACGTTTTATGCAAACACCTTGGTTCGGGTGCGCTTGTATATGGGCCGGCGCCGGGGAGTTTTTCACCCCAAGCAGTCTAC

2900 3000

TTTGTCAATGTGTGTAATTGCAGCCCCTCGGGGGGTTGTGCAGAGAATTTATTTGTTTGGTGTGTATGATGCTACTTGTGCCATTATGCAACTGGGCGAG 3100 GCAGGCAATGCTGCTAGTGAAAGACTGTGTACTTGCGGGTTCAGACACAGCACCCCTTCCCTGCGGGCCACCTATGTAACTGACACCAGGATTGACCGGG AGCTGAACTCTCAAGACACGGCTGAGTTCTTTAGCAGTGATGAAGATAATTTTTTGAGTAGATGGGCGTGGTTTGGGGGAGTATAAAAGGGGCGCGG -----TACGTGGCTGTGTATTTACAGCCATGGACCCTCAACCCTCAACAGAAGGGGCTTGTGAACACGTGTTTTGTGACTACGCGTATTCCGTCTTGGGCAGGAGCAAGACA

3200 3300 3400

GAATGTCACCGGGTCAGATTTAGAAGGAAAGCCCGTGCCCTCAG~TGTGCTGGAAAGTGGACGCCCGCTTGCAGCCCCGCGCATCAGAACTTTGTATGAG 3500 GAGCAGCAGCTGAACATGCTTGCGGTGAATGTTCTTTTGGATGAGCTGAAGATCCAGGTGGCTGCCATGCAAAACTCTGTGACTGCTATTCAGCGAGAAG

3600

TAAATGATCTAAAGCAACGAATCGCCCGAGATTAATGTAAAAATAAAATTTATTTCTTTTTTGAATGATAATACCGTGTCCAGCGTTGTCTGTCTGTAAT AGTTCTATG FIG. 2. The nucleotide sequence of the EcoRI-C fragment. Nucleotides are numbered from the left end of CAd2. The ITR, the TATA boxes (---), the polyadenylation signals (-), the presumed cap site of El A (*). Opening and closing parentheses indicate presumed 5’ and 3’ splice sites respectively, and underscored double lines indicate initiation (ATG) and termination codons (TAA, TAG, TGA) of predicted proteins (see text). A box indicates the HindIll cleavage site at 4.5 m.u.

mRNAs which may correspond to the RNAs detected by the Northern analysis, were chosen and are shown in Fig. 5. The 0.8-kb mRNA can code for a protein con-

sisting of 232 amino acid residues (232R) in ORF-1 and ORF-2 (Fig. 3). It seems to correspond to the 13 s mRNA product of El A of human Ads from the compari-

DNA I. c.

CAd2

I

I

II

/

463

El

il

I11111111111 III1 II II II II I I II I

1 1 1

I.C. Frame 31 T . C.

Ill1

111

I

Ill

I

1 1 /I

ORF3

A00

/

I/M

/ 1 iI II/ 11111 II 1 1 I I 111 ;400

llSO0

a200

3. The occurrence of initiation (ATG) and termination (TAG, TGA, TAA) codons of the l-strand. Vertical lines indicate initiation (above the horizontal lines) and termination (under the horizontal frames which show amino acrd homology to those of human Ads are designated ORFl-5.

CAd2

Ilj

!ll

ORF4

II 1 1 1

I FIG.

OF

I

Frame 1 T . C.

bottom. reading codons.

SEQUENCE

409

~CTxCA~GC~:GACGA--AAGAGGACT~CTCTTGAGTGCGCAGCGAGAAGAGTTTT~CTTC

Ad5

468

Ad12

424

~~GTGA~TTCCTCAAGAGGCC-~CTTGAGT~C-CAGCGAGTAGA~TTT~CCTC ***x*x * ************ ****it************** TTGCCGCCGTGTTCGTCAAGAGGCC-EJCTCTTGAGTGC-CAGCGAGAAGA~TTT~CTGC

******

*

************

Nucleotide numbers lines) codons In three l.c., initiation codons,

are shown at the frames. The open T.C., termination

473bp

l ******

***********

**

533bp * 479bp

FIG. 4. Comparison of the nucleotide sequence from E 1 ATATA box. The nucleotide sequences of CAd2. Ad5. and Ad 12 are aligned. Asterisks above the sequences of Ad5 and Ad1 2 indicate identical nucleotides with CAdP. Solid lines indicate TATA boxes. Boxes indicate the cap sites of Ad5 and Ad 12. An Inverted triangle rndrcates the presumed cap site of CAd2 El A.

A AG G T 588

TTTAATT 400 \

G T 1014

622

AG

1129

AG

1281

\

Cap

439

pAA

0.8

kb

ORF2

ORFl I’t----i ATG

JOO

TAG828

TAAlOI5

TAA1022

fAA1313

Fl

I

0 TATATAA1387 I

1osT68 --I

TATAAA

GT AG 2097 2088 AG 2383 I

:qfQ4 I

::a4

%7

3158

AG3343 I

I

AATAAA3542 I

ORF3

ATG ‘840 ’

F3

kb

0.9

kb

0.4

kb

ORF5 I

ORF4 ATG1461

2.0

‘TAA 1860

lj

TGA 2320

I 133R

TAG2390 \ TGA2412

II-

TAG TA&ATG;&2;34,, 29673164

1 Fl

I

1

I I 438

r

TAA3533

R

F3 -

Fl ‘-

FIG. 5. The presumed El products. are represented by lines with predicted sites. PO&ions of the TATA boxes, the at the top. Open reading frames (ORF) Frames (Fl-F3), numbers of residues, products: B, El B products.

F2

103R

Presumed mRNAs which may correspond to those found by Northern analysts (Tsukiyama ef al., 1988) sizes. Arrow heads indicate the direction of transcription. Caret symbols indicate the presumed splice presumed splice donors (GT) and acceptors (AG), and the polyadenylation signals (AATAAA) are shown are shown by lines marked off with vertical bars. Resulting El proteins are represented by fillrng bars, and positrons of initiation (ATG) and termrnatron (TAG, TGA. TAA) codons are shown near bars. A; El A

464

SHIBATA

A CAd2 Ad5 Ad7 Ad12 SA7

ET AL.

Region 1

.. iKYTIVPAPRNLHDYVL-ELkWQpDCL-iCEYsHGspSMRHII-CHGGVITEEMAASLLDQLIEEVLADNL-PPPSHF MRHLRFLPQEIISSETGIEILEFVVNTLMGDDPEPPVQPF MRTEMTPL--VLSYQEADDILEHLVDNFF-NEVPSDDDLYV MRHLALEM----1SELLDLGLDTIDGWLHTEFRPVPAGVS *** * **. . *

*

E;EiDEF--~~ELETSHSPFVGLCDSCAKADTDSSAST~AD G~spLsTp~VspIppH PTLHELYDLDVT-APEDPNEEAVSQIFPDSVMLAVQEGIDL TFPPAPGSPE---PPH PTLHDLYDLDVD-GPEDPNEGAVNGFFTDSMLLAADEGLDI PPPETLVTPGVVVESG PSLYELYDLDVESAGEDNNEQAVNEFFPESLILAASKGLF EPPV--LSPVCEPIGG MSLHEMYDLDVT-GQEDENEEAVDGVFsDAMLLAAEEGJ---EMP-NLYSPGp----II* *;

Region 2

95 94 99 94 86

Region 3

* ---------------PTSPASISDD LSRQPEQPEQRALGPVSMPNLVPEV ------------RGGKKLPDLGAAE ---------------ECMPQLHPED -----------LVGGGEMPELQPEE * ** * **

----------------RSiiTTF; -__--__-____--___-______ EQSI----HT-----AVNEGVKAA ENGMAHVSAS-----AAAAAADRE SQVETERKMAEAAAAGAAAAARRE **

FYQEQ--DDNA

159 169 178 174 175

*

.. * * ------------------SAATpCE----------------pDGVSKRpCVp

*

.. ***

. * 215

SPD-----STTSPPEIQAPAPANVCKPIPVKP-KPGKRPAVD --------STTADSNHG----SPPTLRCTPPRDLPRPVPVKAS-PGKRPAVN * *** * *****

. . . .. * *** *****

. , * ** *

---------EPEQTEPLDLSL-K-PRPN CIEDLLN--EPGQ--PLDLSC-KRPRP KLEDLLEGGDG----PLDLSTRKLPRQ SILDLIQEEEREQTVPVDLSV-KRPRCN SLHDLI--EEVEQTVPLDLSL-KRSRSN **

*

***

*****

232

289 261 266 266 ****

*

;;;$I* CAdZ

Ad12

* ******

267 238 239 241

MDPLKICENYLTFRAIIR-GSTLSPGFFRRWCFPALADVVGNI----VEQE EAWECLEDFSAVRNLLEQSSNSTSWFWRFLWGSSQAKLVCRIKEDYKWEF MEVWAILEDLRQTRLLLENASDGVSGLWRFWFGGDLARLVFRIKQDYREEF MELETVLQSFQSVRQLLQYTSKNTSGFWRYLFGSTLSKVVNRVKEDYREEF

* ** :__(**** ***/:*‘I:-

RFWQILPENHAFWGLLRRGFTVASFTEIITAAQLENRGR LLKSCGELFDSLNLGHQALFQEKVIKTLDFSTPGRAAAALLDDIPGLFEALNLGHQAHFKEKVLSVLDFSTPGRTAAAILADCPGLLASLDLCYHLVFQEKVVRSLDFSSVGRTVAS-IAFLATI

.. * * ** ----------SF--LLRNWPSDSVVPEADRLDLVCAPAWsR"RY-----------------------GAR~P-G KDKWSEETHLSKGYLLDFLAMHLWRAVVRHKNRLLLLSSVRPAIIPTEEQQQQQEEARRRRQE--QSPWNPRAGLDPRE LDKWIRQTHFSKGYVLDFIAAALWRTWKARRMRTILDYWPVQPLGVAGILRHPPTMPAV-LQEE-QQEDNPRAGLDPPVEE LDKWSEKSHLSWDYMLDYMSMQLWRAWLKRRVCIYSLARP---------LTMPPLPT---LQEEKEEERNP-AVVEK *** * ** *** * * * **

AFLAFI AFLSFI AFLTFI

99 99

* 133 176 178 163 **

**

FIG. 6. Comparison of the amino acid sequences of the El proteins between CAd2 and other adenoviruses. Residue numbers are indicated at the right-hand side of the amino acid sequences. Asterisks at the bottom of the alignments indicate that amino acid residues are conserved among more than three viruses, and asterisks above the amino acid sequences indicate that residues are conserved between CAd2 and more than two other viruses. Dots show that all residues compared have some similarity: acidic (D. E), acidic amide (N, Q), basic (H, R, K), aliphatic and hydrophobic (M. I, L, V), small aliphatic (A, G), hydroxyl (S, T), aromatic (F, Y, W), cystein (C), and proline (P). (A) The predicted 232R protein of CAd2 and the 13 S mRNA products of Ad5, Ad7, Ad1 2, and SA7 are aligned. The conserved regions reviewed by Moran and Mathews (1987) are represented by boxes (regions l-3). The small boxes in the region 3 indicate the postulated metal binding residues (Berg, 1986). Inverse triangles in the region 3 mark the splice junctions. (B) The predicted 133R protein of CAd2 and the corresponding El B proteins of Ad5, Ad7. and Ad1 2 are aligned. Boxes indicate the conserved regions between CAd2 and human Ads (regions 1 and 2). (C) The predicted 438R protein of CAd2 and the corresponding El B proteins of Ad5, Ad7, and AD1 2 are aligned. (D) The predicted 103R protein of CAd2 and the proteins IX of Ad5, Ad7, and Ad1 2 are aligned.

son of the Amino acid sequences (Fig. 6a). The 2.0and 0.9-kb mRNAs (Fig. 5) seem to correspond to the 22 S and the 13 S mRNAs of El B of human Ads, respectively. The 2.0-kb mRNA can code for a 133R protein (ORF-3, see Fig. 3) and 438R protein (ORF-4). The plX mRNA (0.4kb) can code for a 103R protein (ORF-5).

Figure 6A shows comparison of the amino acid sequences of the predicted El A proteins of CAd2 with those of Ad5, Ad7, Ad12, (reviewed by van Ormondt and Galibert, 1984) and simian adenovirus type 7 (SA 7; Kimelman et al., 1985). El regions of these primate Ads have complete transforming activity unlike

465

DNA SEQUENCE OF CAd2 El

C CAd2 Ad5 Ad7 Ad12

.. **

* *

*

*

100 MEPDRQVNQRPPRFRARGAGVRGRGRVRRSAFSRGQRRPIIRVDDLQLPDPLYVMQALQRDHTLEMPRGQVDFSWIEAEERRVGPTDEWYFEAVKTYKAK 97 95

MERRNPSERGVPAGFSGHASVESGCETQESPATVVFRPPGD-NTDGGAAAAAGGSQAAAAGAEPM-EPESRPGPSGMNVVQVAELYPELRRILTI-TEDG MDPPNSLQQGIRFGFHSSSFVENMEQSQDEDNLRLLASAASGSSRDTETPTDHASGSAGGAAGGQSESRPGPSGGGVADLF-----PELRRVLTRSTTSG MEREIPPELGLHAGLHVNAAVEGMAEEEGLHLLAGAAFDHAAAADV------------ARGEGGGAEP-CGGGEVNME-QQVQEGHVLDSGEGPSCADDR * * ** * * * **

*

a6

*

.

.

*

*

.. **

***

.

...

.

NVTFDCRTTAT-------------------VVCCISERNTLFHNCVFSGPHMLCLDLRAGAEVR~CH~V~AVCALRSKGLYSIR~KNSI~EK~AF~V~TG CYISGNGAEVEIDTEDRVAFRCSMINMWPGVLGMDGVVIMNVRF-TGPNFSGTVFLANTNLILHGVSFYGFNNTCVEAWTD-VRVRGCAFYCCWKGVVCR CYlSGNGAEVIIDTQDKAAFRCCMMGMWPGVVGMEAlTLMNIRF-RGDGYNGIVFMANTKLILHGCSFFGFNNTCVEAWGQ-VSVRGCSFYACWIATSGR AYIIGNGAIVEVDTSDRVAFRCRMQGMGPGVVGLDGITFlNVRF-AGDKFKG~MFEANTCLVLHGVYFLNFSNICVESWNK-VSARGCTFYGCWKGLVGR * * *** * *** ** **** * ** * **** * * ***** *** * * ** * *** * * ****

. .. .. . ** ***

.

.. . * *

. . *

.. *

.

.

.

.

..

* 175 197 193 ia2

-------------------------PGDDLQTIlKNYAKISLECGA"YEiNSKIRVTGACYIIGNCAVLRPNLPAG~AMFEVLNVDFIPSIGF~~R~VFS QGLKGVKRERGACEATEEARNLAFSLMTRHRPECITFQQIKDNCANELDLLAQKYSIEQLTTYWLQPGDDFEEAIRVYAKVALRPDCKYKISKLVNIRNC Q-NRGIKRE-RNPSGNNSRTELALSLMSRRRPETVWWHEVQSEGRDEVSILQEKYSLEQLKTCWLEPEDDWEVAIRNYAKISLRPDKQYRITKKINIRNA D--KQEKKE-SLKEAAVLSR-LTVNLMSRPRLETVYWQELQDEFQRGDMHLQYKYSFEQLKTHWLEPWEDMECAtKAFAKLALRPDCSYRITKTVTITSC * * *** *** * ** * * * ** ** **** * * * ** * * *

.

*

. 256 295 291 280

**

**

.

* * *

* ** * ** * --SKASISHCHFKDCTCSIMLGGQGTIAH----------------SQFIVTTSAEAPMNLQLCTCEGNGSHVVPZGNIHFA~~RRASWPTFYA~TLV~ PKSRASIKKCLFERCTLGILSEGNSRVRHNVASDCGCFMLVKSVAVIKHNMVCGNCEDRASQMLTCSDGNCHL---LKTIHVASHSRKAWPVFEHNILTR VKSQLSVKKCHFERCNLGILNEGEARVRHCAATETACFILIKGNASVKHNM~CGHSDERPYQMLTCAGGHCNI---LATVHIVSHARKKWPVFEHNVITK PKSKLSVKKCLFEKCVLALIVEGDAHIRHNAASENACFVLLKGMAILKHNMVCGVSDQTMRRFVTCADGNCHT---LKTVHlVSHSRHCWPVCDHNMFMR ******* *** ** * *** ** * * * ** ** * **** * *** * * * ** * **** **

.. . . .. . . . * **+ * * *

.. . . .. . .

*

.

.

. . ..

.

. . . . ,.

VRLYMGRR~GVFHPKQSTLSMCV~AAPRGVVQ~IY~F~~Y~ATCAI--MQLGEAGNAASE~L~T~~FR~STPSLRATY~~-DT~~~RELNSQDT~~~F~~ CSLHLGNRRGVFLPYQCNLSHTKILLEPESMSKVNLNGVFDMTMKIWKVLRY-DETRTRCRPCECGGKHIRNQPVMLDVTEELRPDHLVLACTRAEFGSS CTMHIGGRRGMFMPYQCNMNHVKVMLEPDAFSRVSVTGIFDMNIQLWKILRY-DDTKPRVRACECGGKHARFQPVCVDVTEDLRPDHLVLACTGAEFGSS CTIHLGLRRGMFRPSQCNFSHSNIMLEPEVFSRVCLNGVFDLSVELCKVIRYNDDTRHRCRQCECGSSHLELRPIVLNVTEELRSDHLTLSCLRTDYESS * * * *** * * *** ** * *** *** * **** * ** * * * * **** * * *** **

*ii

.

II

. . . .. . . .. . .. . . . .. . . . . . * * * *** **** **

.

.

.

..

336 392 388 377 *

.. 433 491 487 477

***

* *

***

**

.

~D~QQ-------K~L:NTCFVTTRI~~WAGARQNVTGSDLE~K~~-~--~D~~--E--~GR~~--~~PRIRTLY-------------------------MS--T-N--S-FDGSIVSSYLTTRMPPWAGVRQNVMGSSIDGRPVLPANSTTLTYETVSGTPLETAASAAASAAAATARGIVTDFAFLSPLASSAASRSS MSGSAS-----FEGGVFSPYLTGRLPPWAGVRQNVHGSTVDGRPVQPANSSTLTYATLSSSPLDAAAAAAATAAANTILGMGYYGsIV-ANSSSsNNpsT MNGTTQNNAALFDGGVFSPYLTSRLPYWAGVRQNVVGSTVDGRPVAPANSSTLTYATIGPSPLDTAAAAAASAAASTARSMAADFSFYNHLASNAVTRTA * * * * * *** *** ******** ** ***** **** **** * * ** ** *** *** *

*

.

438 496 492 482

DEDNF DEDTD GEETD DEDDN ***

CAd2 Ad5 Ad7 Ad12

.

... **

. . * **

. **

58 94 94 100 *

.

--EEQQLNMLAVNV------------LLDELKIQVAAMQNSVTAIQREVNDLKQRIARD ARDDKLTALLAQLDSLTRELNVVSQQLLD-LRQQVSALKASSPPNAV LAEDKLLVLLAQLEALT-------QRLG-ELSKQVAQLREQTESAVATAKSK VREDILTVMLAKLETLT-------AQLE-ELSQKVEELADATTHTPAQPVTQ ** * ** * ** * ** ** *

103 140 I38 144 FIG. b--Continued

that of CAd2 (Graham et al., 1974; Sekikawa et a/., 1978; Shiroki et a/., 1977; Kimelman et al., 1985; Tsukiyama et a/., 1988). The 232R protein of CAd2 and the El A proteins encoded in the 13 S mRNAs of the human and simian Ads are aligned. From comparison of amino acid sequences and functional analyses of a lot of E1A mutants, the functional domains of El A have been proposed, i.e., transforming activity was related to regions

1 and 2, and region 3 was related to transactivation of viral early genes (reviewed by Moran and Mathews, 1987). The 232R protein of CAd2 shows partial homology to the conserved regions of human Ads. The first third of region 1 is conserved in CAd2 while the remainder shows no homology. The leftmost 11.3% EcoRIC fragment of CAd2 could not transform rat 3Yl cells completely while the leftmost 45% fragment could

466

SHIBATA

cause oncogenic transformation (Tsukiyama et a/., 1988). The deletion in region 1 may affect the incomplete transforming activity of CAd2 El. Regions 2 and 3 and C-terminal regions of which terminal five residues are required for the rapid nuclear localization of the ElA proteins of Ad5 (Lyons et al., 1987) are conserved. Interestingly, four cystein residues in region 3 which construct the postulated metal-binding consensus sequence (Berg, 1986) are conserved among human, simian, canine (Fig. 6A), murine (Ball et a/., 1988) and tree shrew (Brinckmann et a/., 1983) Ads. Figure 6B shows the smaller proteins of CAd2, Ad5, Ad7, and Ad1 2 encoded in the larger El B mRNAs. Homology between CAd2 and human Ads are partially seen (indicated as regions 1 and 2). The hydrophilicity profiles of region 2 are very similar (remarkably hydrophobic, data not shown) between Ad5 and CAd2. Figure 6C shows the larger El B proteins. The percentage similarity at the C-terminal among the four Ads is higher than at the N-terminal. Figure 6D shows plX of CAd2, Ad5, Ad7, and Ad1 2. The N-terminal halves of these amino acid sequences are highly conserved. Although plX of CAd2 is considerably smaller than those of human Ads, deletion is seen mainly in nonconsetved regions. The sequence data of the EcoRI-C fragment shows that the organization of the genes is very similar to that of human Ad El genes. However, the EcoRI-C fragment transforms rodent cells incompletely (Tsukiyama et a/., 1988) indicating that CAd2 El genes are defective in the function of transformation. Recently, we observed that the cells cotransformed with the EcoRI-C fragment and a fragment stretched from 10 to 32.0 m.u. showed the characteristics of completely transformed phenotype; i.e., the cells were serum-independent, anchorage-independent and tumorigenic in newborn rats (unpublished data). Mapping and sequencing of the region which complements the transforming function of CAd2 El genes are in progress. ACKNOWLEDGMENTS We thank Dr. J. Obokata, Hokkaido University, and Dr. R. Padmanabhan, University of Kansas Medical Center, for helpful advice. This work was supported by a Grant-in-Aid for Cancer Research and Grant No. 58480078 from the Ministry of Education, Science and Culture, Japan.

REFERENCES ALESTROM, P., AKUSJARVI, G., PERRICAUDET, M., MATHEWS. M. B., KLESSIG, D. F., and PE~ERSSON, U. (1980). The gene for polypeptide IX of adenovirus type 2 and its unspliced messenger RNA. Cell 19, 671-681.

ET AL.

BALL, A. O., WILLIAMS, M. E., and SPINDLER, K. R. (1988). Identification of mouse adenovirus type 1 early region 1: DNA sequence and a conserved transactivating function. 1. Viral. 62, 3947-3957. BERG, J. M. (1986). Potential metal-binding domains in nucleic acid binding proteins. Science 232, 485-487. BERK, A. J., and SHARP, P. A. (1978). Structure of the Adenovirus 2 early mRNAs. Cell 14, 695-711. BERK, A. J. (1986). Adenovirus promoters and ElA transactivation. Annu. Rev. Genet. 20,45-79. BRINCKMANN, U., DARAI, G., and FLUGEL, R. M. (1983). Tupaia (tree shrew) adenovirus DNA: Sequence of the left-hand fragment corresponding to the transforming early region of human adenoviruses. EMBO/. 2,2185-2188. CHOW, L. T., BROKER, T. R., and LEWIS, J. B. (1979). Complex splicing patterns of RNAs from the early regions of adenovirus 2. 1. Mol. Biol. 134,265-303. DYNAN, W. S., and TJIAN, R. (1983). The promoter-specific transcription factor Spl binds to upstream sequences in the SV40 early promoter. Cell 25,79-87. FRIEFELD, B. R., LICHY, J. H., FIELD, J., GRONOSTAISKI, R. M., GUGGENHEIMER, R. A., KREVOLIN, M. D., NAGATA, K., HURWITZ, J.. and HORWITZ, M. S. (1984). The in vitro replication of adenovirus DNA. In “Current Topics in Microbiology and Immunology” (W. Doerfler, Ed.), Vol. 1 10, p. 22 1. Springer-Verlag, Berlin/New York. GRAHAM, F. L., VAN DER EB, A. J., and HEIJNEKER, H. L. (1974). Size and location of the transforming region in human adenovirus type 5 DNA. Nature (London) 251,687-691. GRAND, R. J. A. (1987). The structure and functions of the adenovirus early region 1 proteins. Biochem. J. 241, 25-38. GREEN, M., and PINA. M. (1964). Biochemical studies on adenovirus multiplication. VI. Properties of highly purified tumorigenic human adenoviruses and their DNA’s, Proc. Nat/. Acad. SC/. USA 51, 1251-1259. HEARING, P., and SCHENK, T. (1986). The adenovirus type 5 ElA enhancer contains two functionally distinct domains: One is specific for ElA and the other modulates all early units in cis. Ce//45,229236. HENIKOFF, S. (1984). Unidirectional digestion with exonucleaselll creates targeted breakpoints for DNA sequencing. Gene 28, 351359. IIDA, Y. (1987). DNA sequences and multivariate statistical analysis. Categorical discrimination approach to 5’ splice site signals of mRNA precursors in higher eukaryotes’ genes. CAB/OS 3, 93-98. KIMELMAN, D., MILLER, J. S., PORTER, D., and ROBERTS, B. E. (1985). ElA regions of the human adenoviruses and of the highly oncogenie simian adenovirus 7 are closely related. /. Wrol. 53, 399409. LYONS, R. H., FERGUSON, B. Q., and ROSENBERG, M. (1987). Pentapeptide nuclear localization signal in adenovirus El A. Mol. Cell. Biol. 7,2451-2456. MAXAM, A, M., and GILBERT, W. (1980). Sequencing end-labeled DNA with base-specific chemical cleavages. In “Methods in enzymology,” (L. Grossman and K. Moldave, Eds.), Vol. 65, pp. 499-560. Academic Press, New York. MORAN, E., and MATHEWS, M. 6. (1987). Multiple functional domains in the adenovirus ElA gene. Cell48, 177-l 78. NORRBY, E.. BARTHA, A., BOULANGER, P., DREIZIN, R. S., GINSBERG, H. S., KALTER, S. S., KAWAMURA, H., ROWE, W. P., RUSSELL, W. C., SCHLESINGER, R. W., and WIGAND, R. (1976). Adenoviridae. lntervirology7, 117-125. PERSSON, H., PET~ERSSON, U., and MATHEWS, M. B. (1978). Synthesis of a structural adenovirus polypeptide in the absence of viral DNA replication. Virology 90, 67-79.

DNA

SEQUENCE

PET~ERSSON, U., and MATHEWS, M. B. (1977). The gene and messener RNA for adenovirus polypeptide IX. Cell 12,741-750. PRUIJN, G. T. M., VAN DRIEL, W., and VAN DER VLIET. P. C. (1986). Nuclear factor Ill, a novel sequence-specific DNA-binding protein from HeLa cells stimulating adenovirus DNA replication. Nature (London) 322,656-659. SANGER. F.. NICKLEN, S., and COULSON, A. R. (1977). DNAsequencing with chain terminating inhibitors. Proc. Nat/. Aced. Sci. USA 74, 5463-5467. SEKIKAWA, K., SHIROKI, K., SHIMOJO, H., OJIMA, S.. and FUJINAGA, K. (1978). Transformation of a rat cell line by an adenovirus 7 fragment. Virology 88, l-7. SHINAGAWA, M., IIDA, Y., MATSUDA, A., TSUKIYAMA, T., and SATO, G. (1987). Phylogenetic relationships between adenoviruses as inferred from nucleotide sequences of inverted terminal repeats. Gene 55,85-93.

OF

CAd2

El

467

SHIROKI, K., HANDA, H., SHIMOJO, H., YANO, S., OJIMA, S., and FUJINAGA, K. (1977). Establishment and characterization of rat cell lines transformed by restriction endonuclease fragments of adenovirus 12 DNA. Virology 82,462-471. TSUKIYAMA, T., SHIBATA, R., KATAYAMA, Y., and SHINAGAWA, M. (1988). Transforming genes of canine adenovirus type 2. J. Gen. viral. 69, 2471-2482. VAN DER EE, A. J., and VAN KESTEREN, L. W. (1966). Structure and molecular weight of the DNA of adenovlrus type 5. B~ochim. Biophys. Acta 129,441. VAN ORMONDI, H., and GALIBERT, F. (1984). Nucleotlde sequences of adenovirus DNAs. In “Current Topics in Microbiology and Immunology” (W. Doerfler, Ed.). Vol. 110, pp. 3-l 42. Springer-Verlag, Berlin/New York. ZIFF, E. B. (1985). Splicing in adenovlrus and other animal viruses. ht. Rev. Cyfoi. 93, 327-358.