Biochimica ct Btoplo'swa/Iota 867 (1986)114-123
114
Elsevier
BBA 91591
Computer a n a l y s e s o n t h e s t r u c t u r e o f j u n c t i o n s i t e s b e t w e e n a d e n o v i r u s D N A and cellular DNA J o c h e n S c h e l l n e r a, K u r t Sti~ber h a n d W a l t e r D o e r f l e r a . a Institute of Genetics, University of Cologne, Weyertal 121, D-5000 Cologne 4l, and h Max-Planck-lnstitut fftr Zi~chtungsforschung, Cologne (F.R.G.) (Received February 13th, 1986)
Key words: Viral genome integration; Junction site; Nucleic acid sequence; Computer analysis; (Adenovirus)
In previous work we have cloned and determined the nucleotide sequence of sites of linkage between mammalian cell DNA and foreign (viral) DNA. These investigations have been performed to study details of the mechanism of recombination in mammalian cells. Cloned lines of adenovirus-transformed cells have been used in the analyses because they constituted cell populations in which the foreign DNA had been fixed at certain sites in the cellular genomes. In the present investigation, these nucleotide sequences at sites of linkage have been subjected to computer-aided analyses. A number of sequence motifs have been determined; sequence features common to all junction sites have not been discernible. In some instances, patch homologies have been detected. At several sites of junction, the cellular DNA sequences seem to be transcriptionally active, even in cells that do not carry foreign DNA. Transcriptional activity may be a necessary but perhaps not sufficient precondition for recombination of mammalian DNA sequences with foreign DNA.
Introduction In previous work, we have cloned and sequenced a set of junction sites between cellular and foreign (adenovirus) DNA from adenovirus type 2 (Ad2)- or type 12 (Adl2)-transformed cell lines (for review, see Ref. 1). Transformed cell lines were used in this study on the structure of recombination sites, because they represented clonal cell lines. The aim of this series of investi* To whom correspondence should be addressed. Abbreviations: Ad2, human adenovirus type 2; Adl2, human adenovirus type 12; CBA-12-1-T, an Adl2-induced mouse tumor; CLAC1, an Adl2-induced hamster tumor cell line; CLAC3, an Adl2-induced hamster tumor cell line; HE5, an Ad2-transformed hamster cell line; KB cells, a human epithelial cell line; SYREC2, a symmetric recombinant between Adl2 and human cell DNAs; T l l l l ( 2 ) , an Adl2-induced hamster tumor line.
gations has been to investigate the mechanism of recombination between mammalian cell DNA and foreign DNA introduced into the genome of these cells. Comparison of the nucleotide sequences at eight different sites of linkage between adenovirus D N A and human, hamster or mouse DNA have not revealed common sequences that could have been interpreted as signals for recombination. Patch homologies have been described [1-5], and we have concluded that such patch homologies between cellular and foreign DNA, though possibly ubiquitous, could be important to stabilize intermediates in DNA recombination. In the present communication, these structural analyses on viral-cellular DNA junction sites have been extended using computer-aided methods. Nucleotide sequences from all junction sites determined in our laboratory have been subjected to a search for a series of genetic signals. The data
0167-4781/86/$03.50 © 1986 Elsevier Science Publishers B.V. (Biomedical Division)
115
Q ju Ir~ion
viral
cell
--2
2--
__3
12~3.... inverted repeats
I-6"1TATA-box Nil GGGCGGG (incl mismatch )
@m
•
[] [] o O I ! I 5O
n 6
I
I N
II
6 lb
a [bp]
I 1(~0
[bp]
I,I1,:.. direct repeat
A/D
A ~ patches
~op]
6 ~b 5t
I
I~! CCGG n CG [ ] GAGG (up to 40 bp) CAGG (up to 40 bp)
• GCGC
0 0 0 ' 1~ i
I
,
I
I
I
,
I
I
I
open reading frames (ORF) I
3'{,
}5'
I
Fig. 1. General survey of signals analyzed in individual junction sequences. These scheme represents the key to individual signals investigated. The key to individual sequences is given in Table II.
TABLE I
®
I
cell
'-ax
junction
A summary of these data has been presented previously [1]. HI
[] •
0
O~
n
0
6
14'3
~
.~
'
I
vu
6~b'
' '~o
3'{
. . . .
"
O
~,
.uL
Source of sequence '~[~
_1
N
.
.
.
.
.
. . . .
0{-
N U C L E O T I D E SEQUENCES AT J U N C T I O N SITES BETWEEN Ad12 OR Ad2 DNA A N D M A M M A L I A N CELL DNAs
viral
1~'
' ~[~1
,}3, .
/}5'
Fig. 2. Computer analyses of the left junction between Adl2 and hamster cell D N A from the Adl2-induced hamster tumor line CLAC3 [2].
CLAC3 (Ad12-induced hamster tumor line) HE5 (Ad2-transformed hamster cell line) CLAC1 (Ad12-induced hamster tumor line) SYREC2 (symmetric recombinant A d l 2 and human cell DNAs) CBA-12-1-T (Ad12-induced mouse tumor) Tllll(2) (Adl2-induced hamster tumor line)
Figure
Ref.
2
2
3
3, 6, 9
4
4
5
7, 8
6
5
7
1
116
®
@
I
cell
junction
viral viral
I
6 tb
5~
. . . .
100'
'
,'
I junction
cell
;gt[bp]
-L
"~ i
6 lb
6 lb
+{?-
5'0 '
i
i
i
~1) ' ' '
16o'
'
6 I'Ol~"~'
' t,~t[bp]
+
"
~ J'~
++o I
,o
'
' I~ '
'
~
'
l~l[bp]
Jl.
lgl[bp] 3'
6
1:0
i i
5~0
f /
"
I~0
'
'
160 '
'
'
l~ll[bp]
i
3 ' f ~ - '
:
,,}+,
Fig. 3. Computer analyses of the left (a) and fight (b) junction sites between Ad2 D N A and hamster cell D N A from the Ad2-transformed hamster cell line HE5 [3,6,9]. A sequence of 300 nucleotide pairs at the preinsertion site (c) (opposite page) has also been analyzed.
fail to provide evidence for common signals or an obvious uniform structure that could be implicated in directing the recombination process. On the other hand, the decisive structures directing recombination may well be sought in DNA-protein complexes, and their detection may not yet be directly amenable to nucleotide sequence analyses. Materials and Methods
Nucleotide sequences at junction sites These sequences were determined earlier [1-9] and their origin is summarized in Table I. The sequences themselves are not reproduced in this report: they have been published previously (for review, Ref. 1). Computer programs The following computer programs were applied: 'Structure' to search for direct and inverted repeats [10]; 'patch homology' to scan sequences for nucleotide homologies of 8 and more nucleotides [1], and 'frames' to detect open reading frames coding for potential polypeptides [11]. The
search programs were run in a VAX 11/750 computer (Digital Equipment Corporation). Results
The junction sites whose nucleotide sequences were analyzed in detail are listed in Table I. The nucleotide sequences have been scanned for a number of signals. These signals are schematically designated in Fig. 1: direct and inverted nucleotide sequence repeats, patch homologies between viral and cellular nucleotide sequences, patch homologies between cellular sequences at the preinsertion site, open reading frames on either DNA strand, and the nucleotide sequences 5'-GGG C G G G - Y (a promoter motif), TATAA sequences, 5'-GCGC-3' (HhaI recognition sequence), 5'-CCGG-3' (HpalI, MspI recognition sequence), 5'-CG-3', 5'-GAGG-3' and 5'-CAGG3' (both presumably found at sites of recombination) have been localized in the viral and cellular sequences at the junction sites. The total cellular sequences determined at individual junction sites have been included in the analyses. Corresponding inverted repeats are designated by Arabic numer-
117
®
I preinsertion
site
_2
_2 l 4
..% 4_. 6_.
.0
5
_7
8 .11
J..1,
,I.T
m
20
29
21.
21
22 ?=1
24
Fl
6~b' I
'
5b
'
'
'
'
160'
r-~ '
'
I,~i0 . . . .
260
_i
,%
'
'
2,~0
'
'
' 36o[~
II
.iv _v
v .Y..L
~,2I xl X
=x._b
Xl
X£
%
jl 'Jll
~£Vll
~a
~..Xb
~(II
6 ib
~0
I
I
i
100 :
'
' 250'
'
' ~o[bpj
'
'
'
' ~o(~]
E/m)
6 Ib
50
'
'
'
'
I00
I~o'
'
'
'26o
2~3'
I
5 ¸.
!-
3'?
f
--'t I 5
..L
.
.
'
.
. . . .
~:0
'
.~2x
.
'
.
0
5b
. 5 o.
(J
o
•
'
.
.
0 O0
;oe'
~0'
l .o o .
O
Fig. 4a. See legend on next page.
3'.
5
I0
0
i
~
6-
~)o
.
@
ceil
.
'
O0
m
1L
'
0
'
'
0
~0
~
1,~
'
XJ
O0
'
*
. . . .
'
'
2oo
2oo
'
'
. . . .
o DO0000
2.
2~
:~
25o
on7 00
. . . .
,Jm
'
360
~0
~bo
'
. . . .
'
0 0 030] 0 0 O
0m0
'
'
3,~
' 3~
.&
0 0
. . . .
. . . .
. . . .
00 o
[]
,60'
4oo
'
'
'
' 4~0
4,~'
. . . .
'
4~o . . . .
000 0 0 0 000 0 0
•
46o . . . .
06
•
.7.
so0
'5o0'
5bo'
o 0300
m cI~ oo
'
'
o
..&
lunct ion
,4
'
0
5~
"
550 FJA
xrv
sso
15
~5
'576 ibp
576 [bp!
[bp!
W76
L,}
4.
®
:
'
m
'
'
'
v
&
,
go, ) . . . .
N
k
O0 O 0
~,
'~b,
'
'
I
preinser tion site
0
1
v
"L~Z)[bp]
3'
~
16o ~bp]
lbol~ol,p]
Vl
ooo o
cell
0
b ~b
±
w)
QO0
' do . . . .
i do . . . .
~- ,IL
O
mm
' do ' ' ~
®
lib
O~ID . []
[]
,
v,ral ju~]t~n
1C0 . . . .
16o . . . .
~o . . . .
0
i
J..
'
1~o . . . .
1do '
,It
1do . . . .
'
vl
0
0
0
....
~
....
200 . . . .
~
vii
....
0
2do . . . .
~.~0 . . . .
~
I
~
O0
'a28[bp]
' 32B[b~
'328[t~1
- - 1 5
abo'
300 '
abo'
cell
[81.
Fig. 5. C o m p u t e r a n a l y s e s o f t h e j u n c t i o n b e t w e e n n u c l e o t i d e 2081 (left t e r m i n a l ) o f A d 1 2 v i r i o n D N A a n d h u m a n K B cell D N A f r o m the s y m m e t r i c r e c o m b i n a n t S Y R E C 2
Fig. 4. C o m p u t e r a n a l y s e s o f the left j u n c t i o n (a) b e t w e e n A d l 2 D N A a n d h a m s t e r cell D N A f r o m the A d l 2 - i n d u c e d h a m s t e r t u m o r line C L A C 1 [4]. T h e p r e i n s e r t i o n site (b) h a s also b e e n a n a l y z e d .
5,I~
616
tb
IV
6~6'
6
nllB
[]
O 0 moo
cell
L
'
J
.
.
ILl
•~
~
.
'
.
.
nl
.
Fig. 6a. See legend on next page.
L
J
10 . .
0.
31
5,~
ib
i..
@
6
6~
cell
:3
~o .
.~L
' ~'
iv
2.
.
'
.
.
1~
' I~'
.
.
.
0
. p
~
.
.
' '26o ....
v
.4.
.
.
2 ,6o .
.
:z~o' '
.
.
.5
' ' ' ~soi: :
36o,'
I
L
°l
jtmct~
_l
--i'I3~JS[ 5 bpl~
r-~
7
3.
viral
o
I
I
'
o
I
I
1
&
I
I
i
I
50
&
I
I
I
[3
7
.1.
O
preinsertion site
I
&
I
I
.1.
I
I
i
150
150
[]
Z
I
I
I
i
1
-}o,
+~l[bp]
1~1[bd
191 ~op]
cell
o
lo
6 lb
D
o
£
'
cell
I
'
a
junction
i
i
i
D
'
50
--
: 5 o ~ ) xJtb
5~
v~al
i
[
100
1oo
100
5b
b
juJtion
~
[]
~
~
~
~
6c
150
15o
I,~)
'
cell
i
~
i
200
2oo
~
D
B i
~3,
Xb
I
I
i
I:3
lc
ic
o
viral
vlu.~
' 2,5~0
junction C
Fig. 7. Computer analyses of the junction between A d l 2 DNA and hamster cell D N A from the Adl2-induced hamster tumor T l l l l ( 2 ) [1].
Fig. 6. Computer analyses of the left junction (a) between A d l 2 D N A and mouse cell D N A from the Ad12-induced mouse tumor CBA-12-1-T [5]. The preinsertion site (b) has also been analyzed.
5'I',.
0
I
0 1o
I
cell
@
0
'
rl
I-I
i
ti 3 r
297[bp]
' 297[bp~
122
Discussion
TABLE II KEY TO DESIGNATIONS OF PATCH HOMOLOGIES Patch homologies have been designated by capital letters as indicated below. If more than one such homology has been found, they have been consecutively numbered. When the same sequence was found in different junction sequences, it has been designated by the same combination of capital letter plus number. Figure
Sequence
Capital letter
2 3a 3b 3c 4a 4b 5 6a 6b
Adl2/CLAC3 Ad2/HE5 left junction Ad2/HE5 right junction Ad2/HE5, preinsertion site Adl2/CLAC1 Adl2/CLAC1, preinsertion site Adl2/human KB SYREC Adl 2/mouse CBA-12-1-T Ad12/mouse CBA-12-1-T, preinsertion site AdI2/Tllll(2)
A D C E F J l G
7
B H
als, direct repeats by Roman numerals, and patch homologies by capital letters. Patch homologies have been cross-checked between different junction sequences (Figs. 2-7). A list designating patch homologies in individual sequences is presented in the legend to Fig. 2. The results of analyses on sequences listed in Table I are presented in Figs. 2-7 (see key in Table II). These schemes are self-explanatory. For cell line HE5, the right and left junction sequences have been investigated as well as the preinsertion site (Fig. 3). In the hamster tumor Tl111(2) sequence, the situation is complicated in that an inverted fragment of Adl2 DNA has been interspersed into the cellular sequence [1]. Thus, three transitions between viral and cellular DNA had to be analyzed in this sequence (Fig. 7). As described earlier [8], the SYREC2 sequence stems from a symmetric recombinant between the leftmost 2081 nucleotides of Ad12 DNA and KB human cell DNA, and this recombinant has been isolated from purified Ad12 virus particles (Fig. 5).
The computer-aided analyses of several sites of recombination between viral and cellular DNA have revealed a variety of signals, but no striking common features. Patch homologies between viral and cellular sequences which lie adjacent to each other, as well as between cellular sequences and deleted viral sequences are apparent and have been described in detail earlier [1 5]. It is likely that some of these patch homologies could have played a role in stabilizing recombination complexes and hence have mediated the recombination event. However, such homologies have not always been seen. It is therefore questionable whether they have an essential function in the recombination between mammalian cellular and foreign DNAs. At the present stage of investigations, it would be premature to disclaim the possible importance of sequence arrangements at sites of recombination. However, we have not been able to discern clearly recognizable common features, it is conceivable that more complex patterns of sequence arrangements might exist. More powerful computer programs may help to detect such patterns in the future. On the other hand, chromatin structure at sites of recombination may also play a decisive role. We have made the striking observation [9] that the cellular sequences at several sites of recombination are transcriptionally active and contain sequences homologous to small RNAs of 300 400 nucleotides in length. These observations have since been extended to several of the junction sites analyzed in our laboratory (unpublished data). The cellular sequence at the junction to A d l 2 DNA in the symmetric reco~nbinant SYREC2 is transcribed into polyadenylated RNA which can be translated in vitro (unpublished data). These RNAs are also found in cells devoid of viral genomes, i.e., in cells not transformed or infected by adenoviruses. It appears plausible that actively transcribed regions of cellular DNA are predisposed for the interaction with foreign (viral) DNA introduced into the nuclear compartment. Transcriptional activity of cellular DNA may therefore constitute a necessary but perhaps not a sufficient precondition for recombination. Further research will be required to determine precondi-
123
tions for recombination between cellular DNA and foreign (viral) genomes.
Acknowledgments In the initial part of this work, Rainer Leisten, Frankfurt, provided invaluable advice in the computer analyses. This research has been made possible by the Deutsche Forschungsgemeinschaft through SFB74-C1, and by the Federal Ministry of Research and Technology in Bonn-Bad Godesberg (BCT 03652).
References 1 Doerfler, W., Gahlmann, R., Stabel, S., Deuring, R., Lichtenberg, U., Schulz, M., Eick, D. and Leisten, R. (1983) Current Topics Microbiol. Immunol. 109, 193-228
2 Deuring, R., Winterhoff, U., Tamanoi, F., Stabel, S. and Doerfler, W. (1981) Nature 293, 81-84 3 Gahlmann, R., Leisten, R., Vardimon, L. and Doerfler, W. (1982) EMBO J. 1, 1101-1104 4 Stabel, S. and Doerfler, W. (1982) Nucleic Acids Res. 10, 8007-8023 5 Schulz, M. and Doerfler, W. (1984) Nucleic Acids Res. 12, 4959-4976 6 Gahlmann, R. and Doerfler, W. (1983) Nucleic Acids Res. 11, 7347-7361 7 Deuring, R., Klotz, G. and Doerfler, W. (1981) Proc. Natl. Acad. Sci. USA 78, 3142-3146 8 Deuring, R. and Doerfler, W. (1983) Gene 26, 283-289 9 Gahlmann, R., Schulz, M. and Doerfler, W. (1984) EMBO J. 3, 3263-3269 10 Sti~ber, K. (1985) CABIOS 1, 35-42 11 Devereux, J., Haebedi, P. and Smithies, O. (1984) Nucleic Acids Res. 12, 387-395