GENE AN
INTE;RNATIONAL JOURNAL
ON
O£NE5 AND OENOME5
ELSEVIER
Gene 177 (1996) 35-41
Nucleotide sequence of ovine adenovirus tripartite leader sequence and homologues of the IVa2, DNA polymerase and terminal proteins Sudhanshu
V r a t i a, D . E . B r o o k e s a, D . B . B o y l e b, G . W . B o t h a , ,
a CSIRO Division of Biomolecular Engineering, P.O. Box 184, North Ryde, N.S.W. 2113, Australia b Australian Animal Health Laboratories, Geelong, Victoria, Australia
Received 20 November 1995; revised 26 February 1996; accepted 27 February 1996
Abstract Ovine adenovirus OAV287 was previously isolated from sheep in Western Australia. Here we describe a portion of its genome between map units 10.3 and 31.7 which includes major ORFs for homologues of the IVa2 polypeptide and the DNA replication proteins, Terminal protein and DNA polymerase, as well as the N-terminal portion of the 52/55-kDa polypeptide. In addition, as a prelude to possible adaptation of this virus as a vector we have mapped the elements which make up the tripartite leader sequence of late mRNAs, thereby defining the probable location of the OAV major late promoter. In other human and animal adenovirus genomes, one or two VA RNA genes are encoded between the ORFs for Terminal protein and 52/55-kDa polypeptides. In OAV, these ORFs overlap, suggesting that if VA RNA genes are present, they may lie elsewhere in the OAV genome. Keywords: Major late promoter; RNA splicing; VA RNA genes; Sheep
I. Introduction Adenoviruses (Ads) are widely distributed a m o n g m a m m a l s and m a n y serologically distinct viruses have been identified (Ishibashi and Yasue, 1984). In sheep, adenoviruses have been classified into at least six serotypes by cross-neutralization tests (Adair et al., 1986; Boyle et al., 1994). At least two genetically and serotypically distinct OAV groups appear to circulate in sheep in Australia (Boyle et al., 1994). We have begun the molecular characterization of an isolate OAV287 from one group (tentatively classified as serotype 7 (Boyle et al., 1994). This OAV genome is only ~29.5 kb in length (compared with ~ 3 4 kb for the other group) which makes it the smallest Ad genome described to date. We previously described a portion of the OAV genome coding for several proteins including p V I I I and
* Corresponding author. Tel. +61 2 98864969; Fax +61 2 98864818; e-mail:
[email protected]. Abbreviations: aa, amino acid; Ad, adenovirus; bp, base pair(s); cDNA, DNA complementary to mRNA; DNA pol, DNA polymerase;kb, kilobase(s); mORF, major open reading frame; MLP, major late promoter; m.u., map unit(s); nt, nucleotide(s); OAV, ovine adenovirus; PCR, polymerase chain reaction; TLS, tripartite leader sequence; TP, terminal protein; VA, virus-associated. 0378-1119/96/$15.00 © 1996 Elsevier Science S.A. All rights reserved PH 0257-8972(96)00266-1
fiber. The E3 region, which lies between these m O R F s in other Ads, is missing from this location in OAV (Vrati et al., 1995), emphasizing the unusual arrangement of genes in this genome. It was therefore of interest from an evolutionary and practical viewpoint to examine other regions of the genome. In the present work we describe the sequence and arrangement of the genome between m.u. 10.3 and 31.7. This region codes for the OAV D N A replication proteins, T P and D N A pol, the IVa2 protein and the N-terminus of 52/55-kDa protein. Some RNA transcription landmarks including the tripartite leader sequences have also been mapped.
2. Experimental and discussion 2.1. Nucleotide sequence and genome arrangement
OAV287 sequences presented here are contained within the B a m H I D and B fragments (Boyle et al., 1994) which are located in the genome in that order between the left hand end and m a p unit 37.1. The nucleotide sequence of the region between m a p units 10.3 and 31.7 (i mu = ~ 2 9 5 base pairs) is described in Fig. 1. As the complete nucleotide sequence has now been determined (Vrati et al., 1996) the numbering re-
S. Vrati et al./Gene 177 (1996) 35-41
36
~ q ~ a 2
'
I
~
T "
A A C ~ C E F C F
TC~q'TTAAGTAqTTATA~ 5 Q L N M F I
P
CTAAA~TAAA'YI~TG'TATAACTAAC, N V N L R K V Y Q N
P
C A A A C ~ A ~ K V Q D S
P
ACC~TA q ' I ~ C ~ T Y T R K R
Q
~ F
~ V
A R
CAC~Aq'rATCC,AGGTAC,A ' F I ~ T A C T A A G C C A ~ A ~ T ~ T A T A T ~ S Y Y S W S F S E T I P V S N
S
Y
~ E
I
I
L
A C
M
~ S
W
~
3180
A
Q
A K
T ~ ~ S N V
T Y
Y
F
A C G T A C T A C ~ T A T A G A A A ~ F r C A q ~ C C T A T T C A C ~ G C C ~ A G ~ T T A T C A A A A A ~ A ~ A ~ T ~ T A ~ u F L . ~ ~ A ~ A H D V I D K L V P I L A S P F G F S Y N K I F R N I Q S S D L Q P S I V H .
:
.
:
.
:
.
:
.
:
.
:
.
:
.
:
.
:
.
:
.
:
TAAATCTAATCATA~C'I'I'FA[[CG T A C ~ A ~ C . A T A % ~ I ~ A ~ N L N T D S I A D K F A U Q
3420 C
:
CTA~TTAC3~TCAC N L T K T
L
P
C
E
A
T N
~ A ~ A ~ S F A D
- , - ~ Y N G E
~ i~ ~~ C Q
3780
O
S
CC A T A T G ' P I ~ A C ~ ' T ~ A A ~ G T A . % A ~ G ' ] ~ A ~ A P I C G D K L L N H A E N V T K Q Y N V A
I
T A A C A C A T I ' T A G T C A G ~ A T T C A A C T G G T A C T A T G T A T G T A A A T A t ,T~"~"U I C , ~ A G A ~ A C C T A A A C ~ A T C I ~ C T A G A T A G A A A T C ~ ~ N H L D T M L Q G H Y M C K D F V E Q S E P F P N R C I S P D I K V V D N K W
N
~-~-iT r ~-~ ' A A C A ~ T A A C F F [ T E P I
CAACATTAC~%~FTC~ P Q L G C N L
F
S
P
Aq'?ACA~%~F~AA~" ~"~T ~"ICC T A ~ J C ~ A ~ C O C C A C * A A C C T A ~ L T D S L F K G L F S K R D I T R A T K S K D U ) ~ P ~ * F N E W F P N E T L P E Q P R P N .
:
.
A A C ~ A ~
•
:
.
:
.
:
A ( ~ C * A ~ A A C ~
:
.
:
.
:
.
:
CAATC~
.
:
.
:
CTAG~AATCA~ATATTATAAAGAATAT~AACTCTAA~ D I L K G S I I N R I F Q
CAC~C, AAC~3C A ~ ~ P T E Q R V P L P H R }~ D Y P Y
.
:
.
:
:
.
:
.
T
.
:
L L
:
.
R G
V W
P Q
:
~
.
4020
4140
4260 E K
.
~
:
E K
:
A
3900 S
~
I Y
.
~
:
~
T L
.
GAA~TC~ACCAT~AGCC~3GTC~
.
V
3660 I
K
TGGACA~ATAT~'I'CATT~AC'~AC G T P G Y V T V I
P
S
GTC C 4 % G A ~ A T A T ~ T ~ T ~ A A R C F I N D P N D
G
L
W
T A U E P T I
K
S
3540 P
K
T ~ T ~ T G K
F
I
K
A
C ~ A A A A A A G A A C , A'F~AC C T A I K E E L P V
I
C
3300 R
.
ACT AACTTAAAAA~CFTAGGAATTATAAAGGAGATAGTAC~'GC C C ~ G T A T A A T A C A T ' F I ' I ~ T G A T ~ ' ~ A A ~ A ~ T ~ ~ S Q I K L D K I N G R D H R P N M N }4 L V V V V T Y G N C K A F R N F I K S
~AC~TATCTTACT~ACTAGG~AACTAAGTATC/fAAGTAGATAATATTATGTCTATCTAC~G~C A H F F S S I S H C S G L Q N M C E D I I I
L
D I
R G
.
:
A
:
4380
.
:
.
:
4500 D
S
V
T
°
F
R
E
E
T
K
T
Q
T
P
Q
R
E
I
F
L
L
S T A R T
N
Y
D
T
F
m O R t
C
H
Z V & 2
L
I
E
TTAAAT~TGT'fCAATACAGAAAAT'~ATATTAAA~I'CTCGAT~ATGTACAAAGAC~ F K ~ N K C T I D K L A Y L K P A L I C
T
E
GT~TC'I~ACCTACAAAAAGTAG~AATITAAAC~TATAAAAAATTCAC, W T L C P H K E D F I L N S N Y K
AGT~A~FI~ATC~TAC~TACATTC~T~ATAC, D G Y I S K I E R E I Y
P
K
G
M
K
K
p R O T E ~
L
S
Y N L C N K A H ~ K A R L K ................ V ...................
S
S
F
A
A
G
C
R
ATACAq'FAGGA~ATI~'TACTC.AAC CAC C H L G K S L M L Q ]4 G
AA(~ATAq~A~Gq~.~AGTAC~ E G Y L ~ E A W
D
E
S
M
F
S
V
~ E
A
A
C
~ K
G E
N
C T
E
A L
T
~ W
T
E
G
I
A A G ' ~ E V L
C
4820
~ T ~ L S ~ ........ I...
~ A
G
A L
C F
V
U S F
S
4740
~T~ I
A
4860
R i t e
(+)
TC~EGTGA A C T C A A C C T A T A G C C A A C A A G A A C T A G G T A A G A A G G A A T A G G A G T ~ A T C A A C A A T T T T A A T I ~ A G T A A T ~ F T A G T C C T C T A ' ~ " r ~'~"~"t~ T A ~ Q T P Y R N N K I W E E K D E I T T L I L D N C D A S L F T T A T A
B
~
/
~
m
~
e
r
T ~ S
~ K o ~
.
:
~ P
~ F
K
4980
~ T
V
H
D
Q
N
1
(+) •
(~TA G I
:
F
L
.
P
:
F
.
:
.
:
A A ~ Y i ~ A A ~ A A ~ C ~ A C ~ C C N F N E N T D K S D K
T~GT~CATACTACAq'PA2~AG6T I V T T Y S T I E
A
S
G
ATA~TC GTA~Aq'I'~AAATC*AATATC~/r Y L A N S L L K S I S R ..................
ZIZ .
.
.
.
C~
•
:
.
:
.
.
I~AC~AACA~TC~TC~CC~TC K N N S K S A .
.
AT~ATATA~6q'~ATAA(3C~AT'~AC~ Y ~ Y I A E E F I ° L ............................
F
.
.
.
.
.
.
:
Q
.
:
.
K
E
.
.
.
:
.
K
A
T
11~ A
A
D
R
~ L
AAC~AC,A ~ A ~ T A K W R E Y T T
H
S
P
K
K
TAA M S
L
E
E
S
.C~TA~C,A~F~AT~ D I G E L V
A
.
Gf~TA'/~ATA M N .
:
.
.
A V .
.
~ E
A
:
A
A
S
S
.
.
C,A A
T V
C V
S
~
~
R
V
Y
A K
T Q
.
.
.
.
:
.
~ A G T A I M P K
:
' K
~ L
P
~
.
:
5100
T S
A
C
L
.
T
5
I
T
P
R
O
~A F
G
U
M
:
q ~ ' C I S N
5340
~ K
P
D
W
C
.
:
5460
.
~ I
L
N
S
~
TAT~'~'z'F ~t-~"L ~ I N F S C E
L
W
.
~ ~ A ~ T D F Y D
~ A
L
5220
:
5580 K
~ A ~ ~ATA~ L A S A Y M G C I D XX ......................
P
~GCATT~ATAqq~T~TAGTTA~F~ATGAG~TC~TTATTCAAC T V I V D D I C Y E K A E E
H
F
.
~
.
K .
GTAACCC~'f 'FI~A.%A~YFI~GGTATACC~CCTAC~ N A E F A N L T K G Y P L P H ...................................
ACA~TTATAAC~AC'ITAAGGITATAAAAACT~A~FI~AACA~.GAAA~ATAAC4~/~GGTT~A't" 0 L Y E E F S N W Y K Q V P Y G N N E K E Y E ~ A
C W
C T A q ' T ~ A C - A A ~ / 3 G A G A A ~ ~ A ~ ~ S P Y C R G c, R L S M R V F D Y M K E II ............................
T I ~ C T A T I ~ T A ~ T T A A C ~ F R Q Y G E I L K E V L Q
~ A T A C S L Q Y
.
D N
ACATI~CACTAATFTAGCTATC'I~TC~'rAAAT~'ITACA~-GT~A H L T I L D I S T L V E N I L T E N S Vl ...............................................
AGTCTCC CTACTAGAAGAAGTAATA'FI~AAACAAATTI~AC~TAATGATGA~'T~ E 5 P H D ~ E N Y F I Q K F L A Q T N S
T V
.
' Q
G
~'FI~A~'~AC~ CA~'I~A~A'/~AFaA'F?ACT ACC CCC~A(3CC K G S K K S C L P P L V D L Y D L S P P E C D ................... V~ ..............................
AC,A A ~ AT~A~AAC, AAC~ATTAGATA~ E K R Q L L E Q F Y D I
:
A A A A G A ' F F A G A ~ T A C ~ A T A ~ G T ~ T C ~ G A ~ T A ~ T A ~ ~ ~ N E L R K K D K E E I N E A F V V K K K D I R T A F S ....................................... fix ..............................
0
:
:
S
GAGGTAT~ATCT AATAAAACGTAAAGGTAGC~TAAC 5 W V A Y M E S S K I I K C K W G R N ..................................................
L
.
K
.
P
:
G T A ~ C A T C C A A A u z-x"~T~ P M E P L N S F Y
R
L
~A F
8
T
T
C
K
P
K
N
E
L
N
K
A L
A T ~ ~ Y K Q N
H
~
TTT~ G'FTCCTAAACC~A'/'TACATT ATAGTAA'YI~TAq'F~C,~ F C P N P L S L T I D N F L L K G
A
GTC~ATTFC R P M
5820
5940
~
6060
F ~"~"F F ~ ' A G ' T A A C T A A C ~ " / ~ C C I ~ A T ~ T F F D N I A R Y P C Y G K S
C C ~ - F L',~'FFI-F tA C A T A ~ A ' F F T A ~ T A ' ~ t K ( ~ A T A G A ~ A A ' C ~ A ~ F F A A ~ T L L F T D R V M F K I G Q W K M D
5700
6180
6300 E
F
F
N
I
I
G
N
C~AC,A A C A G T C ~ S R T L K
F
6420
ACTC~ Gq'TCATAC' ~ A ~ A ~ A T A A T A C ~ I~C,ATGTATA ~ A ~ A T A T A C ~ C C A A ~ H S A L I I E D F G C I ~ H C, V V Y L E I Y R P E C Q K ................................. :IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8540
S. Vrati et al./Gene 177 (1996) 35 41 • : . ; . : . . TrAA~TAAAT~ACTACAGAAGTATACTATI~AAACTAAGTTC~ACT D K K E K N L V M I D E Y S
L
N
S
E
L
O
H
. A *t~ N E
C
Y
A K
TTAAAAAATI~TACC~AAAGTAGAAGA~TAAATI'GTAAT~AAC~TGC, F K K F A I G I E D E K K N
L
M
V
F
C
E
R
N
AGTATTGGATATAGCCATC E Y G I D T
A
• : . : . : . : . : G T A G G ~ A A ' / T C ~ ' L ' r i'i"F ~ T I ~ C T C A A T I ~ A T ~ A C C G I L K F V L L Y P T L Q K G Y L
S
: . : . : . CATTC ATATACA~ATAGTA%~TGTTA'r Y T Y T E I D Y V
I
• : Ct~AATTT ~ P N F K
. ~.~ I
: K
. ~ W
E
:
W
.
L
K
T
.
:
.
. H
: . : . A~T.AC~ATAACACTATrA H H I N X Y
E
• : . : . : . . . GTAAATAAGAC~q~'TCA~TAA'DITATATAAAC~'FI'ITAG~A~AAGATAACCA K N Q Q K V T K I L Y I E G K
L
I
L
~ A T A A A A A T T T C T A ~ T C ' F f ~ ~ A G A F K W I K L S V P T N S D C L K
L
V
D
:
.
.
.
.
.
.
.
.
.
.
:
AAGAAGAT E E I
I
• : . : CA iTITI'i'AGTGCTCAATAAC~ L F D R T I R
_ G
S
V
.
T
:
D
.
F
.
.
.
.
.
.
.
R
.
R
.
R
S
V
K
W
.
D
E
.
. A
K
.
:
G A T C ~ T A T ~ T T A A C V F D I D Q L L
Q
L
• : . : . : . : q'~AGTAGTCAGTTAGATAA~AATATI'I~'AGA~A~TCATTAAuT D D T L R N E N Y F R I L L
.
. L
.
Y
I
:
Y
5
:
.
:
.
:
.
:
.
:
L
D
T~
V
I
"
S
A
I
V
E
D
N
F
AACA~t'~
•
:
.
E
E
.
:
.
S
L
D
S
S
S
:
S
P
:
.
:
• : . : . (3~TCCAGAATrAGAC'FfACCATTAGGA R P R L R F P L G
.
N
E •
A
E :
:
. A
.
A
E
R
.
:
•
:
:
.
:
.
: V
. R
:
.
:
T
G
.
:
:
I
.
A
C S
:
. S
.
:
D
I
.
:
•
:
.
:
R
A E
T
M
K
TR
Q
: . T A ~ ~ I E V S D
A I
SL : T r L D
, ~
7260
K L FA G A
: . i - i ' ~ T P T N
F
: 7500 V
7620
: . : . : . : . ~qTATkI'I-IL-I'I'I'ATGTC*ATI~GTAAAAACITATr H E L I P F Y V L W K Q I
F
. : - l - i N S R
.
:
' R
F A
l
~
:
T T
P
IbEon
:
. '
E
F
I
: 7740
: 7860
.
: 7980
N
Q
I
(+)
~
:
G
. : ~ ~ ~ ~ E E E IJ
2
.
S
C 5
.
:
A
E
I
.
G E
:
A P
V
.
T
:
~
E
D
. G
:
R
GAGT AATAAAGTCA'FFA~TA~AAAA R M I E T I A R
D
.
G
T
L
8100
P
S
S
.
:
.
:
.
:
.
:
.
:
.
:
.
A M
Protease
:
.
:
C ~ D A
.
:
.
L
cleavage
E
0
O
Protease
L
TLS :
•
: N
cleavage
.
: T
V
P
D
.
. g
:
M
~On
:
AATTI~CATTTATAGC.AT~CCATACAA L R U Y R L T H
:
.
O
S
O
:
G
L R S S L E T P D L S S N TACGTTCATCA A • : . : . : . AT~CAAGTAGT T R E D N S V G S K D L L
T
:
.
L S
.
: E
:
O P
Q G
. G
P
8340
.
:
.
:
.
:
D
I
F
C
E
. A
S
:
N .
H
M :
N
S
.
L
I
:
G A i T ~'i'i'iCC A A C T A ~ ~ A ~ T A Q 8 F P Q Y H Q Q F S
:
.
:
.
:
.
L
F
R
N
N
E
S
I
.
:
: 858f
.
:
870( S
F
site :
T
L
,
I
:
Q
W
.
T
M
. : • : . : . : . A'FfAA *Ti'i'i-~~ F u C T A A T C G G G T A A A A T A A G G G G A ~ A C A T A T C Y N F F R N A W K I G ~ E H
.
:
846~
:
.
Y
(+)
:
:
R
.
:
~ E
R
F
S
,
:
.
:
. A
I
T
G R
: C
~ S
R
splice
.
:
.
:
.
:
.
:
.
T A T C ~ A T ~ T A T A C ~ I P S L Y L K Y
A
.
:
882(
.
: C~
Q
S
T
V
894( G
. : . : . G'FTATCAAGCTAATAAT~C ~ i S R N N L D L
T
: 906[
junction
M H P I L Q A G 'A T ~ C A T C C T A ' F F I T A C A A A A ~ T :
.
A A C C A C ' F f AC,ACGq'I~CGT'FDJ t-ITI-I'I~%AATAGAATAATC~%CGTCTAC W Q H I Q L A F L F N I K N S C I
:
.
:
GTAC~qATAAAA C G I K
T S N K N E K D E F E P A V G I S S K P S D _C~-'TC~TAAAAATC~GA~ TT T ~ TI"I~AG~TC~TC*AAT~A : . : . : . : . : . : . : A T T I ~ A u - ~- I T I ' I C T A C ~ ~ 2 ~ 2 G T I ~ A G A C T A C T r A A A T V D F D F S F S S N S G A T P M E L L G D S
ATATGAC~TCTCA~TGATAA~'i"TAC CTAA~TACCAGTAGTAGATA • : . : . : . : . : TATACTCT~AC~ACTA~AAAT~TI"I~ATGGTCATCATCTAT I L R D STRUT ~]Ur ])T~
:
P
m01%1~ S2,'55-kD~ P R O T ~ .
8220
8
.
52/55-kDa
:
:
sites
Q
L
.
Q ' I ~ T T A T A A G A T ~ A A T I ' I ~ A ~ ~ - I - I - i - I - I ~ L S G L I R D M L V A N Q I Q S I T Q
GTATATACT~TAAATACt~TAC~AGTCAAT~AAACC~GT~ M Y S g T N I S Y S C D
:
:
R
.
:
.
E
.
:
:
N
?
S
.
: , : . : . : . : . G T A C ~ T F I ~ T T A ~ T T T A ~ K A A A T T T A A ~ / T A A G H H G G K G L T L V T A L D K L N I ~
R
:
. : • CAGAACCCGCATITAG~TT R P R L D L
.
7140
V
. : . : , : . : . : . : . : . : -i"t~ - I ~ T A T C * A C C A A A T ~ ( ~ A T I ' F I T A T ~ A A ' F ~ A T T T A A A ~ ~ ~ ~ F L Y Q N D V R A V Y F Y G I D L N P F P E G E F P P
:
:
7380 QN
AW R
Lt~fC A T ~ A C ~ A A T C C 4 % A G A T A T T A A G S Y S E E N A E
F
7020
~
IQ
• : . : ACAACC.A C A ~ A 6 - i N S Y T 8 Q D
•
. : GTCTTAAGT C F E
G
: . : . : . : . T A T A C ~ A A A A G T A A A ~ T G C ~ A T A ~ I D E G N E N R G R L I I Q
•
TAC~AGACTAAT~T. F R R I V
: . : TI'FC.AATTACC~AC L T L A g
R
:
~ A C A G G F G Q 8 ?
CATGTAAA~"FTA~AGA V N Q I A
6900
(+ )
.
TAAATGTATI~TrAAAA~AATTGAAATAT~CCA~'DCATAC~CGT~AAATI"I'DGGTAT'FI~ K C L A K K I K C I L K I L Y T D R
AC,A C T A A A A C T ~ A T C G C Q N Q A S
:
: V
G
V
D
.
I
CT~~AGATA~AGO~fCPTAA~Tu-L'I'ITICCAIr/CAGA~TATATGTAAGTGATAATACQTACAATCTGTTTT~ P I E L Q S F E D I F L E I W F E S F P L D .
: ~ E
I•
K
F
F
G
:
Y
R~Q
R
~ A
6780
A A A ~ T A A A A ~ ~ T T A A T A T T N L K L G Q I A F Y N
~T~'IL-~3~I~TGTGAAAA'YPCATC~ATAAAA~Iq~TCAATCC~%GAATTCACJ~%AAAG :
T I
~ I
.
K
S
.
~? •
~ K
.
: . : . ATATAATC~ATA'FFf A A C A ~ I N S Y I T . ~
S
ATrAAC~ G
~ A A C G A A T ~ T A A T ~ T C C A T r ~ A T A G T F A T r S T E E V N R V R Q 8 R N V P L R
~
C A
L
T I
X
AG'J~AGCC,AACTCGAGCCGAGCGCTAG;~TI-~ -I~GGAA@~AC,A
R
: I K
~ K
F
T L S .
AATTTAC,ATC,A T ~ T ~ A & ' T A A T ~ T C ,
I
R
N
.
T
: . : . : L"ITI-I~%AGTC~KACzGCA~'fAA•Fi - i F F E S G Y N F F
Q
. l'i'lu~ > F L R
T
A
T
. : . Fi-l-i'i'i'~ F F R K
T
C A ~ ~ - i - l - l - i D R E E I) T S
:
Y
~ S
: 6660
E
L
.
K
. : . : . : . : • : G A T A T A q ~ A A C A ~ A C ~ T A A T ~ - A q ' I ~ A S Y L Q R F A Q T E R N F W V R T
• : . : . : . : • : ~ A C CTAAGQ'Ff ATAA~GA~TAAAACGTC~%TTTAAATTC_ATATI~ATFATA F Q I G I N R A A L R N Q V L N L
:
~ K
. ~
N
C S
T I
: ~ L
G
A
T ~
:
~
. I
. . . T C G A T C A V S I Q
~ K
A A A C C~%AAT~AT
.
G
:
: . A C T C ~ S C N
: . i-i~-t-:-t~ L L N *
ITI-I~AAAAGA L V N E
.
.
S
ATAT~TTATATTCATCATCAAC,
.
S
F
.
T
L E I I) Ad2 Inserticn
• : . : . : . : . : . CAAGGGCATGAGC~GTTTC~,%ATI'I~GTAAACATGATATC~AAC~AAAATAAGATI'TAC E R V R E F S R L V g E N T S Y G V I S G T S A ~fART ~ ~ •OL GAACCAACTAATC, Q N I V
:
.
Y
.
~AAGAACAI~Y.ATATATTAAAAATATATACATAGGAGr .
.
37
.
N (÷] :
i~-t"i'i"fA A C ~ K E
918[
F {
.
: 990~
S
N
L
(÷ ) .
: 9360
Fig. 1. Nucleotide sequence of the OAV287 genome between m.u. 10.3 and 31.7 coding for ira2, dna pol, tp and 52/55-kDa homologues (Genbank accession No. U31557). Nucleotide numbering reflects the actual position in the complete genome. Predominantly the complementary (right to left) strand is shown except where the positions of the TLS (underlined) and the 52/55-kDa sequences are indicated as (+). Potential polyadenylation signals (AATAAA) are also underlined. ( ~ ) and (~) indicate experimentally determined termination points for plus stranded transcripts coming from a leftward promoter and the splice junction of the 52/55-kDa transcript with the TLS, respectively. DNA pol regionsI-Vl(wang et al., 1989) and the OAV residues conserved within these regions (individually underlined) are indicated. (*) signifies the location where Ad2 DNA pol carries a twenty three residue insertion relative to OAV. Underlined groups of aas signify possible zinc finger motifs with highly conserved aa in bold type• (It) indicates possible cleavage sites for OAV protease. Methyls: Nucleotide sequences were determined on both strands of the genome, initially by generating nested deletion clones (Pharmacia) but later by using oligonucleotide primers
S. Vratiet aL/Gene177 (1996) 35-41
38 m.u.
20 I
I
40 I
60 I
80 I
LS ! 4
5
6
7
I
I
I
I
I
2 ~ 3 ~
I
2 3 - -
kb 3
ORF1
1 O0
8
52/55k
9
I
IVa2
(+) ~
I pTP
(_)
(--) --
_
~
DNA
Pol
(--)
Fig. 2. Summary of the genome arrangement in OAV between map units 10.3 and 31.7 corresponding to nucleotides 3061-9360 of the complete genome. flects the true distance from the terminal nucleotide. Except for the 52/55-kDa protein, the m O R F s described here are encoded on the complementary strand, from right to left in the order tp, dna pol and ira2. The genome arrangement is depicted in Fig. 2.
2.2. Mapping of R N A transcripts An AATAAA polyadenylation signal (Proudfoot and Brownlee, 1976) is located at bases 3055-3060 at the end of a long open reading frame which is present 5' to the sequence (see G e n b a n k accession No. U40839). By P C R analysis and sequencing it was determined that left-to-right transcripts containing this m O R F terminate at positions 3101-3103 and 3110 (Fig. 1). On the complementary strand, the AATAAA sequence at the end of the ira2 m O R F (Fig. 1) probably terminates transcripts near base 3030, as shown above and previously (Vrati et al., 1995). Transcripts incorporating dna pol may also use this polyadenylation signal as no other is present at the 3' end of that m O R F (Fig. 1). In contrast, the tp m O R F does contain a polyadenylation signal suggesting that these transcripts m a y terminate independently of those for ira2 and dna pol, unlike the situation in the Ad2 genome where these transcripts share a c o m m o n 3' end. The promoter(s) from which tp and dna pol transcripts are derived has not been mapped. The Ad genome also encodes the M L P on the left-toright strand within the rnORF for dna pol. The M L P drives expression of most of the late Ad proteins, producing R N A transcripts which contain a c o m m o n 5' TLS (reviewed in Horwitz, 1990). The OAV TLS was mapped as follows. In Ad2 and Ad5 genomes the TATA box element in the M L P is preceded by a binding site ( C A C G T G ) for a transcription factor known as U S F (Sawadogo and Roeder, 1985). We scanned the OAV
sequence for similar sequence motifs and found candidates on the left to right ( + ) strand beginning at nt 4965 and 4983 (Fig. 1). It was anticipated that exon 1 of the TLS would begin ~ 3 0 nt downstream of the TATA box. Therefore a P C R primer incorporating this region was made. c D N A synthesized from total OAVinfected cell m R N A using a primer complementary to m O R F iiia (Vrati et al., 1996) was P C R amplified using the exon 1/IIIa primer pair. The D N A was cloned and sequenced. This confirmed that the TLS was composed of three segments and identified the junctions between them (Fig. 1) and between TLS and iiia m R N A transcripts. Finally a primer from within TLS exon 3 was used to synthesize c D N A from late mRNAs. The products were run in parallel with a standard M13 sequencing reaction. The strong stop c D N A product generated from the 5' end of the m R N A s allowed the 5' terminal base of TLS exon 1 to be deduced to within one nucleotide (Fig. 1). The arrangement of the TLS is significantly different between the OAV and h u m a n Ad2 genomes. TLS exon 1 is adjacent to the M L P in both genomes. However, Ad2 TLS exon 2 is located within the dna pol gene while OAV exon 2 is located in the tp m O R F , only 205 nucleotides from TLS exon 3 (Fig. 1). TLS exon 3 for Ad2 is ~ 3 0 nucleotides longer than that for OAV but the 3' splice junction for both exons occurs at the same relative point within the T P gene. The tripartite arrangement of the TLS has been retained despite their different locations, although it is not obvious what advantage this provides in viral replication. Sequences complementary to a portion of OAV TLS exon 2 also code for the T P nuclear localisation signal. The location of OAV TLS exon 1 strongly suggests that the M L P lies in the vicinity of nt 4950 5000. Although there is little direct nucleotide and amino acid sequence homology with other Ad M L P s in this region the combination of sequence motifs strongly indicates promoter function. The binding site for transcription factor U S F and a TATA box element is followed by a G T G G A A A element which has an enhancer function in other contexts (Weiher et al., 1983). In addition we have demonstrated promoter function of this region by using it to drive gene expression in other contexts (A. Khatri and G.W. Both, unpublished data). Thus, the OAV M L P / T L S falls at the same relative position within the dna pol m O R F as it does in other Ads. In other Ad genomes additional M L P control elements, which can enhance expression, lie downstream of exon 1 of the
which were synthesized on the basis of newly determined sequence (Vrati et al., 1995). Sequences were determined manually (Sanger et al., 1977) or using an Applied Biosystems DNA sequencer. To map splice junctions and transcript termination points, total RNA was prepared from uninfected and OAV28%infected CSL503 cells as previously described (Vrati et al., 1995). RNA was copied into cDNA using AMV reverse transcriptase (Vrati et al., 1995) and oligo dT2o or a specific primer complementary to the mORF of interest, cDNA was amplified by PCR using Taq DNA polymerase and an appropriate primer pair (Vrati et al., 1995). The amplified DNA was cloned directly using the pGEM-T vector (Promega, Madison, WI, USA) and sequenced.
S. Vrati et al./Gene 177 (1996) 35 41
TLS (Leong et al., 1990; Mason et al., 1990). Nucleotide sequences present in the equivalent region of the OAV genome show no obvious homology with these elements. Promoters for the iva2 and dna pol and tp genes have not been mapped but it is likely, based on analogies with human Ads, that the one for iva2 lies 5', but adjacent to the M L P while the E2 promoter for dna pol and tp lies in the vicinity of m.u. 72 (Natarajan et al., 1985; Shu et al., 1988). Lastly, in contrast to the human Ad genomes, the 5' ends of OAV tp (on the negative strand) and 52/55-kDa mORFs (+ strand) overlap by 39-52 codons, depending on whether tp transcripts are spliced to upstream sequences. The splice site between the TLS and 52/55-kDa transcripts was mapped 3' to the AG dinucleotide adjacent to the ATG of the m O R F (Fig. 1). Thus, the nearby ATG is the first available for translation downstream from the TLS sequences. The relevant features of each protein are discussed below in relation to what is known about the homologues from other Ads. For the most part, Ad2, Adl2 and Ad40 have been compared with OAV as these viruses are somewhat different from each other. Features which are also conserved in these viruses and in the OAV may indicate functional significance.
2.3. IVa2 protein The OAV IVa2 protein homologue (mORF of 446 aa, 51.2-kDa) is only 58% similar and 39.7% identical to Ad2 (determined using G C G Bestfit programme and standard parameter settings as described previously (Vrati et al., 1995). The ira2 and dna pol mORFs overlap significantly but the true extent of this is presently unknown as any splice junctions with upstream sequences have not been mapped. As the first methionine of IVa2 is located ,-~ 120 aa into the mORF, it is very likely that the initiation codon is provided by sequences nearer the M L P which are spliced to the m O R F transcript, as occurs in human Ads.
2.4. DNA polymerase OAV DNA pol is encoded by a long ORF which has a methionine near its N-terminus (Fig. 1). However, this ATG codon is in a weak context for initiation (Kozak, 1987) and upstream sequences may be linked to the mORF. In human Ads, the initiating Met-Ala-Leu tripeptide, which is shared with the precursor to TP, lies within the E2B segment at m.u. 39 (Shu et al., 1988; reviewed in Horwitz, 1990). OAV DNA pol is essentially colinear with DNA pol from other Ads. Only a few small deletions or insertions are evident. However, relative to Ad2, OAV DNA pol contains a significant internal deletion of 23 residues near its N-terminus (nt 7092, Fig. 1). The OAV m O R F (1071 aa) shares 64.5%
39
similarity and 45.2% identity with Ad2 but relative to Ad2 DNA pol, is truncated by about 80 residues at its amino terminus. Ad2 DNA pol contains a serine residue in this region which is a major phosphorylation site thought to be important for enzyme activity (Ramachandra et al., 1993) and three motifs with basic charge which are thought to comprise the nuclear localisation signal (Zhao and Padmanabhan, 1991). These motifs are poorly conserved between Ad 2, 40 and 12 and in the latter, the third region is almost unrecognizable, consistent with its apparent dispensibility (Zhao and Padmanabhan, 1991). Attempts to locate homologous sequences in OAV external to the m O R F have been unsuccessful. OAV DNA pol could possibly be targeted to the nucleus by interaction with the TP precursor which also has a nuclear localisation signal (Zhao and Padmanabhan, 1988), and see below). There are two Cys/His-rich regions in DNA pol (near nt 4600 and 7020, Fig. 1), which may form Zn + + fingers to facilitate template binding by the enzyme. Many viruses carrying mutations in these regions have been constructed and extensive studies on their function have been carried out using a range of assays (Joung and Engler, 1992; Joung et al., 1991; Roovers et al., 1993). The N-terminal proximal region (C-Xz-C-XT-H-C-X9H-X3or4-H ) does not conform precisely to a Zn ++binding motif and alignment of OAV and human DNA pol sequences shows that the spacing between the last pair of His residues is not conserved. There is also very high conservation of the X9 sequence (SxRRRDFYf/y; conserved aas in capitals) between the last Cys-His pair. For the C-terminally located Ad2 motif C-X3-C-X2-CX22-H-C-X2-C all the Cys residues are conserved between human Ads 2, 12, 40 and OAV (Fig. 1). However, the underlined Cys and His residues appear to be unimportant for function by several assays (Joung and Engler, 1992) and the His is not present in OAV (Fig. 1). DNA pols from widely diverse sources contain six regions in which certain residues are conserved (Wang et al., 1989). The highly conserved region I (YGDTDSLF) is absolutely conserved in the OAV DNA pol (Fig. 1). In regions II and V the conserved residues identified (Wang et al., 1989) are also conserved in OAV DNA pol (Fig. 1, underlined single aas) but in regions III and IV this is not the case, although some changes are conservative. The preservation of such residues in the OAV enzyme which shows significant sequence divergence from human Ad DNA pols further emphasises their probable structural or functional importance. Also present within the m O R F of human Ad 2 dna pol, but on the complementary ( + ) strand (between bases 7968 and 8417) is a ~13.6-kDa ORF. A similar ORF is also conserved in human Ads 12 and 40, although whether the protein is expressed in human Ad-infected cells has not been determined. In the OAV genome this reading frame is not conserved. The longest
40
s. Vratiet al./Gene 177 (1996) 35--41
O R F in the region is only 79 aas in length, it contains no methionine residue and shows only 40% similarity with the 13.6-kDa O R F of Ad2. 2.5. Terminal protein
OAV and Ad2 TPs are ~ 5 7 % similar and 33% identical but depending on m R N A splicing events, OAV T P may be at least 50-60 aas longer at the N-terminus, or considerably shorter as its second methionine occurs 167 aas downstream. The motif YSRLRYT, which is conserved a m o n g human Ad TPs, hepatitis B core antigen and several bacteriophage (Hsieh et al., 1990) is poorly conserved in OAV (YSnis Yk). In particular, neither of the arginine residues (underlined) is retained. These changes are inconsistent with the model for protein-primed initiation of D N A replication in which it was proposed that guanidinium group of the first Arg may be involved in binding the chain-initiating nucleotide (Hsieh et al., 1990). Following assembly of the relevant proteins into a pre-initiation complex at the origin of replication in the genome, D N A pol catalyses the transfer of d C M P to Ser residue 580 of the intact terminal protein (Smart and Stillman, 1982). This serine residue and the sequences adjacent to it ( D I D SVEi) are highly conserved in OAV TP. T P is also required in the nucleus for Ad D N A replication and the Ad2 sequence RLPV(R)6VP is thought to comprise a nuclear localisation sequence (Zhao and P a d m a n a b h a n , 1988). This sequence is partially retained in OAV in the form of RLPVnRRqRv. The C-terminal sequence of T P is highly conserved a m o n g h u m a n Ads, suggesting that it might be functionally important. However, deletion of seventeen residues did not affect function in several assays (Roovers et al., 1993). Consistent with this, the C-terminal 32 residues of h u m a n Ad terminal proteins are completely missing in the OAV protein. Ad T P is produced as a precursor, pTP, which is cleaved by the viral 23kd protease via an intermediate form (iTP) to mature T P (Smart and Stillman, 1982). The sites of cleavage (A, B, and C) of Ad2 p T P have recently been determined (Webster et al., 1994). Cleavage at site A (aa 175) generates iTP. The equivalent site is conserved in all h u m a n Ads as well as OAV. Site B, which is only eight residues C-terminal to site A, is peculiar to h u m a n Ads 2 and 5 generates an alternative iTP. Cleavage at site C in Ad2 (aa 404) generates mature TP. Site C is conserved in h u m a n Ads but not in OAV, although alternative sites exist nearby in OAV (Fig. 1). Whether one or both of these are used has not been determined. 2.6. VA R N A s
In most Ad genomes one or two genes coding for VA RNAs are located at m.u. ~ 3 0 between the 52/55-
k D a and Terminal protein genes (reviewed in M a and Mathews, 1993). These are produced by RNA polymerase III transcription as positive regulators of virus replication which act at the translational level to neutralise a cellular defense mechanism which is interferon-induced (Mathews and Shenk, 1991). As the OAV tp and 52/55-kDa m O R F s overlap (Figs. 1 and 2), there is no room in the genome to code for the VA RNAs at this site unless the VA gene(s) overlap both mORFs. In the avian (CELO) adenovirus the VA RNAs are derived by leftward transcription from sequences at m.u. 90 (Larsson et al., 1986). It is possible that OAV VA RNAs are also encoded elsewhere in the genome and this is under investigation.
3. Conclusions The arrangement of this portion of the OAV genome shows similarities with other human Ads but the OAV homologues differ considerably in nucleotide and amino acid sequence. The identification of the OAV M L P and TLS sequences should facilitate expression of foreign genes if the virus can be adapted as a vector.
Acknowledgements This work was supported by a grant from the Australian Wool Research and Promotion Organisation.
References Adair, R.M., McKillop, E.R. and Coackley, B.H. (1986) Serological identification of an Australian adenovirus isolate from sheep. Aust. Vet. J. 63, 162. Boyle, D.B., Pye, A.D., Kockerhans, R., Adair, B.M., Vrati, S. and Both, G.W. (1994) Characterisation of Australian ovine adenovirus isolates. Vet. Microbiol. 41, 281-291. Horwitz, M.S. (1990) Adenoviridae and their replication. In: B.N. Fields and D.M. Knipes (Eds.), Virology. Raven Press, New York, pp. 1679-1721. Hsieh, J.-C., Yoo, S.-K. and Ito, J. (1990) An essential arginine residue for the initiation of protein-primed DNA replication. Proc. Natl. Acad. Sci. USA 87, 8665-8669. Ishibashi, M. and Yasue, H. (1984) Adenoviruses of animals. In: H.S. Ginsbergs (Ed.), The Adenoviruses. Plenum Press, New York, pp. 497-562. Joung, I. and Engler, J.A. (1992) Mutations in 2 cysteine-histidinerich clusters in adenovirus type-2 DNA polymerase affect DNA binding. J. Virol. 66, 5788-5796. Joung, I., Horwitz, M.S. and Engler, J.A. (1991) Mutagenesis of conserved region I in the DNA polymerase from human adenovirus serotype 2. Virology 184, 235 241. Kozak, M. (1987) At least six nucleotides preceding the AUG initiator codon enhance translation in mammalian cells. J. Mol. Biol. 196, 947 950. Larsson, S., Bellett, A. and Akusjarvi, G. (1986) VA RNAs from avian
X Vrati et aL/Gene 177 (1996) 35-41
and human adenoviruses: Dramatic differences in length, sequence and gene location. J. Virol. 58, 600 609. Leong, K., Lee, W. and Berk, A.J. (1990) High-level transcription from the adenovirus major late ~promoter requires downstream binding sites for late-phase-specific factors. J. Virol. 64, 51 60. Ma, Y i . and Mathews, M.B. (1993) Comparative analysis of the structure and function of adenovirus virus-associated RNAs. J. Virol. 67, 6605-6617. Mason, B.B., Davis, A.R., Bhat, B.M., Chengalvala, M., Lubeck, M.D., Zandle, G., Kostek, B., Cholodofsky, S., Dheer, S., MolnarKimber, K., Mizutani, S. and Hung, P.P. (1990) Adenovirus vaccine vectors expressing hepatitis b surface antigen: Importance of regulatory elements in the adenovirus major late intron. Virology 177, 452-461. Mathews, M.B. and Shenk, T. (1991) Adenovirus virus-associated RNA and translational control. J. Virol. 65, 5657-5662. Natarajan, V., Madden, M.J. and Slazman, N.P. (1985) Positive and negative control sequences within the distal domain of the adenovirus IVa2 promoter overlap with the major late promoter. J. Virol. 55, 10-15. Proudfoot, N.J. and Brownlee, G.G. (1976) 3' non-coding region sequences in eukaryotic mRNAs. Nature 263, 211-214. Ramachandra, M., Nakano, R., Mohan, P.M., Rawitch, A.B. and Padmanabhan, R. (1993) Adenovirus DNA polymerase is a phosphoprotein. J. Biol. Chem. 268, 442-448. Roovers, D.J., Vanderlee, F.M., Vanderwees, J. and Sussenbach, J.S. (1993) Analysis of the adenovirus type-5 terminal protein precursor and DNA polymerase by linker insertion mutagenesis. J. Virol. 67, 265-276. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74, 5463 5467.
41
Sawadogo, M., and Roeder, R.G. (1985) Interaction of a gene-specific transcription factor with the adenovirus major late promoter upstream of the TATA box region. Cell 43, 165-175. Shu, L., Pettit, S.C. and Engler, J.A. (1988) The precise structure and coding capacity of mRNAs from early region E2B of human adenovirus serotype 2. Virology 165, 348-356. Smart, J.E. and Stillman, B.W. (1982) Adenovirus terminal protein precursor. Partial amino acid sequence and the site of covalent linkage to virus DNA. J. Biol. Chem. 257, 13499-13506. Vrati, S., Boyle, D.B., Kockerhans, R. and Both, G.W. (1995) Sequence of ovine adenovirus 100k hexon assembly, 33k, pVIII and fiber genes: early region E3 is not in the expected location. Virology 209, 400-408. Vrati, S., Brookes, D.E., Strike, P., Khatri, A., Boyle, D.B. and Both, G.W. (1996) Unique genome arrangement of an ovine adenovirus: Identification of new proteins and proteinase cleavage sites, Virology 220, 186 199. Wang, T.S.-F., Wong, S.W. and Korn, D. (1989) Human DNA polymerase alpha: predicted functional domains and relationships with viral DNA polymerases. FASEB J. 3, 14 21. Webster, A., Leith, 1.R. and Hay, R.T. (1994) Activation of adenoviruscoded protease and processing of preterminal protein. J. Virol. 68, 7292 7300. Weiher, H., Konig, M. and Gruss, P. (1983) Multiple point mutations affecting the SV40 enhancer. Science 219, 626 631. Zhao, L.-J. and Padmanabhan, R. (1988) Nuclear transport of adenovirus DNA polymerase is facilitated by interaction with preterminal protein. Cell 55, 1005-1015. Zhao, L.-J. and Padmanabhan, R. (1991) Three basic regions in adenovirus DNA polymerase interact differentially depending on the proteha context to function as bipartite nuclear localization signals. New Biol. 3, 1074-1088.