J. Mol. Bid.
(1982) 158. 293.-304
Nucleotide Sequence of Bovine l-715 Satellite DNA and its Relation to other Bovine Satellite Sequences
1)epwrtment of Biochemistry Institute of Physiology and Biochemistry Medical School. ul. Lirzdleya 6. 90-131 &dz’, Poland (Received
17 Auguat
1981)
The nucleotide sequence of the bovine 1.715 satellite DNA repeated unit was determined. About 80% of this 1399 base-pair sequence reveals more than 50% homology to a 31 base-pair long average sequence. The 31 base-pair periodicity also exists in the remaining 20% of its length, although it shows less homology with the average sequence. The 31 base-pair average sequence is composed of one undecaand one dodecanucleotide, which are closely related, and of two almost identical tetranucleotides. The average sequence shows significant homology with the ancestral sequences of the bovine 1.706. 1.720b and 1.711a satellite DNAs. This observation argues for their origin from the same pool of related sequences.
1. Introduction Restriction analysis of the satellite DNAs revealed that they are general]? composed of tandemly arranged repeated units, each unit from several base-pairs to several thousand base-pairs in length (reviewed by Appels & Peacock. 1978: John & Miklos, 1979: Brutlag, 1980). Th e new rapid DNA sequencing methods enabled the determination of the nucleotide sequences of several satellite DNAs (Rosenberg et al.. 1978; Carlson & Brutlag, 1979: Pech et al., 1979a,b; Piischl 8r Streeck, 1980: Streeck, 1981: Thayer et al.. 1981). It appeared that there exist various types of internal sequence organization within their repeated units. The 172 bpt long basic repeated units of the African green monkey satellite DNA has a unique sequence (Rosenberg et al., 1978). About 40yo of the 359 bp long repeated unit of Drosophila melanoga,ster 1.688 g/cm3 satellite DNA is. despite the lack of an internally repetitive struct,ure. composed of two short sequences or their variants ((‘arlson & Brutlag, 1979). The 370 bp long repeated unit of rat satellite DNA is composed of four different subrepeats. which show marked homology with each other (Pech et al.. 19796). The nucleotide sequences of three of the eight known bovine satellite DNAs have been published. The simplest is the sequence organization of the 1.720b satellit,e t Ahbreriations
used: bp. base-pairs:
kb. IO3 bases or base-pairs
as appropriate.
293 002”
28:ww/l
8029:~-12
wKOO/O
87’: 1982 Academic
Press Inc.
(London)
Ltd.
291
A4. Pk;ITCIENNICZAK
ET AL
DNA. Its 46 bp long repeated unit is composed of two related 23 bp sequences. both of them largely self-complementary (Piischl & Streeck, 1980). Sequence determination of the bovine 1.706 satellite DNA revealed that its 2350 bp repeated unit can be divided into four segments, each consisting of different, variants of the basic 23 bp sequence (Pech et al., 1979a). The extent of homologv between t,he 23 bp sequences of the 1.720b and 1.706 satellite DNAs suggest that they have a common ancestral sequence (Yiischl & Streeck. 1980). The 1413 bp repeated unit of the bovine 1.71 la satellite I)?\‘,4 is composed of two internally repetitive segments based on the I.706 satellite 23 bp long basic sequences linked with a 611 bp long unrelated sequence lacking internal repetition (Streeck. 1981). Thus bovine 1.706. 1.71 la and 1.720b satellite T)NAs. despite their different mean base composition, belong to the same family of related sequences.
2. Materials (a) Isolation
and Methods
and fractionation
oj total I)AVd
Bovine DNA was isolated from thr thymus gland as desc~ribed hy Jlalarska ct a/. (l!U!J) 1.715 satellite DNA was isolated as the 1.4 kb EcoRI fragment, from preparations enriched in this satellite DNA by means of precipitation with histone Hl (Skowronski et al., 1978). These enriched DNA preparations did not contain 1.711b satellite sequences, as judged from their electrophoretic pattern after digestion with EcoRI. The EcoRI 1.4 kb fragment was isolated from preparative 35% (w/v) polyacrylamide gels as described by Maxam & Gilbert (1980). (b) Enzymes and reactiotL.s Restriction nucleases AluI, 8au3AI. Sau96, EcoRIl, HaeIII. Nsti, EcoRl, Pstl and J~SIJI were isolated according to the .procedure of Bickle et al. (1977). Sau3,41. Sau96. EcoRII and HaeIII were further purified by DEAE-cellulose chromatography. The reaction conditions for S’au96 were as described by Sussenbach et al. (1978). for E’roRII as described by Kamp et al. (1977). and for EcoRT as described earlier (Klysik et al.. 1979). For all the other restriction nucleases, the reactions were performed in 6 mw-Tris. H(‘1 (pH 7%). 6 rnM-MgCl,, 6 mM-fl-mercaptoethanol. (c) IlAVA sequence n rralysis
Sequence analysis of different restriction fragments derived from the 14 kb repeated unit were carried out according to Maxam & Gilbert (1980), except that specific degradation at purine sites was performed according to Gray et al. (1978). Five different specific reactions for guanine, purine, adenine + cytosine, cytosine + thyminr and cytosine residues were used. About 80% of the sequence was read from both complementary DKA strands. The rest were determined from at least 3 independent sequencing gels. All restriction sites used for terminal labelling were also sequenced. The EcoRl site was sequenced from the independently isolated Rae111 fragment covering this site. The complete sequence was analysed on computers Mera 303 and Odra 1305 using programs developed in this laboratory.
3. Results (a) General
characteristics
of bovine
I.715
satellite
11X.4
The G+(’ content of the bovine satellite I DNA amvunts t,o 59.7Cvo. Different buoyant densities of the complementary strands in alkaline Cs(‘1 gradients, noticed
BOVINE
1.715 SATELLITE
DNA
29.5
by Filipski et al. (1973), are a result of the asymmetry of their base composition. The complete sequence obtained for the most abundant nucleo$ides at each position is shown in Figure 1. The strand presented has the following base composition (in mol%): A, 19.2, T, 21.0, G, 34.0 and C, 25.6. The G+C content along the 1399 bp repeated unit, computed for 140 bp long segments, varies from 52.1 to 67.9%, in agreement with thermal denaturation studies, which revealed the existence of an approximately 140 bp long region showing a remarkably higher melting temperature than the rest of the sequence (Skowronski et al., 1978). The buoyant density of the bovine 1.715 satellite DKA. calculated from its base composition, is 1.718 g/cm3, a value 0602 to 6003 g/cm3 higher than that estimated in other experiments (Filipski et al.. 1973 : Macaya et al.. 1978). The difference may be due to sequence-dependent, effects or to thf, presence of methylated cytosine residues (Kirk: 1967). In all, 82 CpG dinucleotides are found within the 1399 bp repeated unit. They amount to 5.9% of the dinucleotides. We detected methylation in all but seven of these dinucleotides by the technique of Ohmori et al. (1978). Some cytosine residues in the CpG dinucleotides are methylated in almost all 1399 bp units, as judged from the absence of the Sau3AI cleavage site between positions 770 and 771 (Fig. 1). The presence of the SauSAI recognition sequence at this position in almost all copies of the 1399 bp unit has been proved by examination of the MboI digestion pattern (Roizes, 1976; Streeck & Zachau, 1978). For sequencing the 1.715 satellite DNA, we used total (uncloned) fragments. Despite this, most of the sequencing gels showed no ambiguity. All but one of the four ambiguities observed were due to base replacements. One other kind of ambiguity was observed during sequencing of the region from position 896 to 1050. which started just behind a cluster of T residues at positions 904 to 909 and continued upstream in both complementary strands. The ambiguity was caused by the presence of a dT . dA cluster of varying length : it is 5 bp long in about 40yo of the copies of the 1.715 satellite repeated sequence, whereas 60% of the repeats contain six consecutive dT .dA base-pairs. These two classes of the 1.715 satellite repeated units, one 1399 bp long and the second 1398 bp long, is the first evidence for the existence of subfamilies within this satellite. The idea of 1.715 satellite subfractions was originally formulated by Roizes (1976). who postulated the existence of two populations of l-4 kb units differing in the localization of one of the HhaI cleavage sites. Sequence analysis reveals. however, that most of the repeats contain three sites recognized by HhaI (positions 458 to 461, 499 to 502 and 1308 to 1311). Different extents of methylation of the cytosine residues in each of these three sites are responsible for the complex gel pattern observed for the HhaI digest of the 1.715 satellite DNA. The 1.715 satellite repeated unit contains several Ms@ sites. One of them (575 t,o 578) is MspI sensitive in only about 70% of the 1399 bp units. In this case, sequence analysis revealed the missing MspI site to be due to the C to T transition at position 575, instead of methylation of the external cytosine residue in the 5’ C-CC-G 3’ sequence that is known to protect this sequence from cleavage by Mspl (Sneider. 1980). Thus, sequence analysis provided two pieces of evidence for t.he existence of subpopulations among the I-715 satellite repeated units.
“96
.\.
P1:I~(‘IENNI(‘%.4li
h’7’
.A!,.
AATTCAGGCT GCCTCTTGTGTTGGCCCAGGCAAGTCCAAT CTTCCATTCG
50
AGTTGCGAAGGAAAGCTGGGGATTGCTCTC GAGTGACTGCAGGGCCAATA
100
GACCTCATCTAGGCTTGTGT CCAGAAGCCAATGTTCCTCT CCAGGGGCGA
150
CAGGGATCTCGGGGTTGCATTCCAGACGCACCCGGGGAGACAGGCATTCA
200
TCTCGAGTGGAAGCAAAGAA CCCCGCTCTGCTCTCGAATT GTGACGGGTA
250
TCTCTTGGAG CTCACTGGGTGGACTCAAGGGAGTCAAGCCTCCTGAGGCG
300
TTTGGAGAGAGGTCGCGAGATTGGTCTCTA GGCCATGCAGGAGACGAAGG
350
CCCTCATCTC TCGATGACGGCCCAATCTCGGGGTTGTTCT CGAGCGGCGG
400
CCCCAGTGTGCGGTTTCTCA CGAGGTACAACGGCGAGGTCAGTGAGCCTC
$50
TCGTGGGGCGCCAGGGAAGTCGGGTCTCCATGCGAGTGGCGAGGGGGAGC
500
GCGTCATTGCTCCCGAGCCATGGTAGGGGAATCTGGCCTC GAGACGTGTT
550
GAAGAAGGTCTCTCGAGGGCTTTCCCGGGTTGAGGCAGGAAACCCTGGGT
600
TCCCTCGACT TGTGCAGGTGACCTCAGGGGGCTTCTCACGGTGGCTCTGA
650
GAAGCCAGGGAAACTGGAGGTGGGAGGGGC CTCTTGGGACTCCACTGGGC
700
TTGGTGCATT GGAAGAGGGCCTCATCTCCA GTGGAGGCAGGAACCGCAGG
750
TACCTCTGAT TTCAGACTCC GATCGCAGGGTCCCTGCAGACTGGGGACAG
800
GAGAGTCAGGCCTCGTCTTG GGTTGAGGCATGGAACTCCGCTTGCCTCTC
850
GAGATGTCCCCGGGGAGAGAGGCCGCTTGTCGAGCTGTATTTGGAACCTG
900
GGGTTTTTTC CGAACGATGCACGGAAAAACTGCCCCCTCGTGTTGACTTC
950
ATTCACAGGC TGGAGTTCGGAGAGGTGTCCGGGCATCGGGTTCTTATCAA
1000
GAGGGGACCGGGAAATCGGGGTCCTACGGAATGTGGAACCACCCACGAGG
1050
CCACGTCTGGAATGTCTTCG TGAGACCGGCCTCATCCTGA GGTGCGACCG
1100
GAAGGTCGGGAACCCCTTCCAGACAAAGCAGGGGAGTCGACCCTCCTGTC
1150
CAGATCAGGAGGGGAGAAAGGGCTCAGAGGAGGGGGTGCCGGAAAACCTC
1200
AGTGTTCCTC TCGAGGGAGACCGGGATTTCGGGGAACTTTGTGGGTCGCA
1250
TCAAGGGTGCCAAGTGCCCTTTCGACCTCCAATTCCTAAC GTGGGACTTC
1300
TCCTGAGGCGCTGTAGCCCCAAAGGGCTTCATCTTGCGAT GACGGGGGAG
1350
CCACGTGGTTTTTCTCGAGT TACGGCGGGATTCTCAAGTT GCGACGGGG
1399
Frc:. 1. The nucleotide sequencv of the most uncloned I:w!) bp long repeated unit of bovine claritv.
abundant nucleotide I.715 satellite 1)N.A.
residur at ever) position of the Hyphens have been omitted for
HOVIXE
(t)) Intunal
1.715 SATELLITE
pwiodicity
297
DSA
irl the 1399 bp wpeoted
unit
Internal periodicity of the 1399 bp repeated unit is hardly detectable without computer analysis of its sequence. Several independent procedures were applied in t)he search for regularities in the sequence organization. The first procedure was based on comparing the sequence with itself but shifted successively in steps of one base to obtain all possible relative arrangements. The degree of sequence homolog? in terms of the number of identical nucleotides in corresponding positions was cornput,ed for each alignment. A plot of the number of identical nucleotides as a function of the shift length is given in Figure 2. The highest values observed are grouped around shifts that are multiples of 31 nucleotides. This indicates the existence of some kind of 31 bp periodic&y within the repeat unit. In order to check the reliability of this suggestion. the distribution of the numbers of identical nllc~leot~idrs in corresponding positions was analped. The number of identical nucleotides for 31 bp and 62 bp shifts are localized far away from the mean of the distribution. giving strong support for the existence of 31 bp periodicity. In a computer search for short nucleotide sequences occurring frequently in the 13!N bp repeated unit. we have found the sequence 5’ T-C’-T-CGA-G-T 3’ or its variants occurring regularly at 29 to 33 bp intervals. This allowed alignment of several regions of the repeated unit as arrays of 31 bp related sequences. Further examination of t>he sequence allowed presentation of almost the whole length of the repeated unit in the form of arrays of 30 to 31 bp long related se(lut’nces. and to construct the 31 bp long average sequence composed of the most
I-
.
. l.
.
. 1 -. ,--- -.--.------. . :. l * . . -.* . . . 0.. s .. -0.. . . . :.. l
.
.
l..*. l
.
-
.
. .
*f
.*
.%,,
l
.
.
’
l
l
.
.
:.
. l
.
..,”
‘f
.
.
l
.
‘. . .
-0
.
. . .
.
.
. * ..**
.
.
.*
’
.*
.
. .’
. - a. . 0.. .
.. .
*
0..
.
.
.
0. \
.
.
.
. .-
l
”
.
L
.
.*.
. .
.-
.
0.
.
.
. .
.
.
. .
.-
. . .
.
.
.
.’
. .
.
3oc )-
.
I 100
.
I
200
Shift length (bases)
FIG:. 1. Evidence for the intrinsic periodicity of the 1399 bp long repeated unit of the bovine 1,715 satellite I)SA. The number of identical nucleotides occurring at corresponding positions obtained b? cwrnpwison of the 1399 bp sequence with itself when shifted. plotted its a function of the shift length. In the first step of computer analysis, the sequence was compared with itself but shifted by 1 nucleotide. In thr consecutive steps. the alignment was successively changed by 1 nuclrotide.
.A. PbI!(!IENNI(‘ZAK
298 (0 I
2 34 65 97 126 157 189
220 ... 282 313 ... 376 406 437 468 499 530 561 592 622 . 682 713 743 773 804 835 896 . 983
E?’
AL.
(b)
(c)
ATTCAGGCTGCCTCTTGTGTTGGCCCAGGCA GTCCAATCTTCCATTCGAGTTGCGAAGGAAA GCTGGGGATTGCTCTCGAGTGACTGCAGGGC AATAGACC TCATCTAGGCTTGTGTCCAGAA GCCAATGTTCCTCTCCAGGGGCGACAGGGA TCTCGGGGTTGCATTCCAGACGCACCCGGGG GACAGGCATTCATCTCGAGTGGAAGCAAAGA ACCCCGCTCTGCTCTCGAATTGTGACGGGTA
21 17 23 16 21 20 19 20
AGTCAAGCCTCCTGAGGCGTTTGGAGAGAGG TCGCGAGATTGGTCTCTAGGCCATGCAGGAG
16 17
TCTCGGGGTTGTTCTCGAGCGGCGGCC CCA GTGTGCGGTTTCTCACGAGGTACAACGGCGA GGTCAGTGAGCCTCTCGTGGGGCGCCAGGGA AGTCGGGTCTCCATGCGAGTGGCGAGGGGGA GCGCGTCATTGCTCCCGAGCCATGGTAGGGG AATCTGGCCTCGAGACGTGTTGAAGAAGGTC TCTCGAGGGCTTTCCCGGGTTGAGGCAGGAA ACCCTGGGTTCCCTCGACTTGTGCAGGTGA CCTCAGGGGGCTTCTCACGGTGGCTCTGAGA
22 18 21 21 18 17 21 22 18
TCTTGGGACTCCACTGGGCTTGGTGCATTGG AAGAGGGCCTCATCTCCAGTGGAGGCA GGA ACCGCAGGTACCTCTGATTTCAGACTCCGA TCGCAGGGTCCCTGCAGACTGGGGACAGGAG AGTCAGGCCTCGTCTTGGGTTGAGGCATGGA ACTCCGCTTGCCTCTCGAGATGTCCCCGGGG
18 22 18 18 22 ‘.'1
ACCTGGGGTTTTTTCCGAACGATGCACGGA
19
GCATCGGGTTCTTATCAAGAGGGGACCGGGA
19
lOi3 1104 1135 ..., 1197 1227
AGACCGGCCTCATCCTGAGGTGCGACCGGAA 19 GGTCGGGAACCCCTTCCAGACAAAGCAGGGG 17 AGTCGACCCTCCTGTCCAGATCAGGAGGGGA 19 CCTCAGTGTTCCTCTCGAG GGAGACCGGGA TTTCGGGGAACTTTGTGGGTCGCATCAAGGG
23 17
i&ii 1320 1351 1381
ACGTGGGACTTCTCCTGAGGCGCTGTAGCCC CAAAGGGCTTCATCTTGCGATGACGGGGGAG CCACGTGGTTTTTCTCGAGTTACGGC GGGA TTCTCAAGTTGCGACGGGGA
17 16 24 16
Average sequence
ACTCGGGGTTCCTCTCGAGTTGCGGCAGGGA
139!4 bp FIG:. 3. The wrays of related 31 bp sequences constituting 78”,, of’ the bovine I.715 sakllite repeated unit. Each 31 bp sequence listed in (b) shows not less than 5& homology with the average sequence. (a) Position in the 1399 bp sequence of the first nucleotide of the given 31 bp sequence listed in (b). (c) The number of nucleotides occurring at corresponding positions in the average sequence and in the given 31 bp sequence listed in (b). The 31 bp average sequence composed of the most frequent n&e&ides at each position is given at the bottom of the Figure. Hyphens have been omitted for clarity.
BOVIWE
1.715 SATELLITE
299
DNA
frequent nucleotides at each position (Figs 3 and 4). The minimum value for homology bet,ween each of the sequences listed in Figure 3 and the average sequence is 50%, the mean value being 6404. Altogether, these sequences include 1099 bp of the 1399 bp repeated unit. The 31 bp sequences with at least 50% homology are clustered in some regions of the repeated unit and dispersed in others. When dispersed, they are usually separated by several 31 bp long sequences, which reveal less than 50%, if any. homology to the 31 bp average sequence. In two exceptions to the above rule, the periodicity is disturbed by 11 bp and 6 bp deficiencies. One of them is probably due to the deletion of an 11 bp sequence between positions 1380 and 1381, as deduced from the analysis of the surrounding sequences (see Fig. 3). The exact position of the 6 bp deletion is difficult to determine but it is between positions 945 and 971. A search for possible higher-order periodicities, based on the multimers of the 31 bp unit that might reflect intermediate steps in the generation of the 1399 bp unit. resulted in finding an imperfect 885 bp long direct repeat. It consists of two sequences (1055 to 106 and 107 to 540) which, when aligned with each other assuming five deletions (29 bp sequence between 208 and 209, 11 bp between 1380 and 1381, and three deletions of one nucleotide between positions 477 and 478, 1058 and 1059, and 1315 and 1316) show 600/b homology. This degree of homology is relatively high compared with 47%, the average homology between all possible pairs formed from 31 bp sequences listed in Figure 3. These data suggest that a duplicat,ion of a 460 bp long sequence composed of 31 bp units took place during formation of the 1399 bp repeated unit. In the 514 bp long sequence localized outside the direct repeat positions 541 to 1054, t,here are several regions not longer than 60 bp. revealing more than 607; homology with various sequences irregularly localized on the 1399 bp repeated unit. The 514 bp sequence is also based on the 31 bp periodicity.
Iool
2 0 r’
c A
A
G
CC GG BP -T c AG AC ffA,‘:A;“CGc -C T A ;tiAETc GTT GA G I ICI I I CA1 T
I
3
5
7
911
:
e: A%, A
’ ITI
AeA TFTFTfc I I
I
I
13151719212325272931 Nucleotlde posltlon
WC:. 4. Construction of the 31 bp average sequence. The frequencies of A. T. G and C at every position of the 36 sequences listed on Fig. 3 were determined. The average sequence was constructed b? connecting the most frequent nucleotides at every position.
:l(H)
.A. PtC‘(‘1~:SS1(‘%;\K
ET .-II,
FN;. 5. Sequence similarities within the 31 bp average sequence of the bovine I.715 satellit,e IJS.4. The 31 bp sequence can be presented as related dodeca- and undecanucleotides. together with the adjacent octanucleotide composed of 2 related tetranucleotides. Identical nucleotides at corresponding psitilms are shown in boxes. Hyphens have been omitted for clarity.
Thr 41 t)p average scc4uenc~ has sewral c+haratrt,eristic~ fbaturt-s. It IIN\’ Iw considerrd as two pairs of direct reprat,s. including nuclrotides I to 23 and 2-1t.o 31 (Fig. 5). Nucleot,ides I t,o I2 can also tw aligned with nucleotides 12 to 23. right nucleot,ides being identical in t)otjh sequences in corresponding positions. The remaining X 1)p long sequence is composed of two t’~tranu~lrotides. whose sequrnws differ from each &her at OIW position. Three dyad symmetry structures can be distinguishrd in tjhe 31 t)p avrw~p sequence. Their wntres are localized between positions 10 and 11 POand Xl and 16 and 17. and they involve 16. 11 and 12 nuclcwtides. respwt~iwly (Fig. 6).
31 1 C-A-G-G-G-A-A-C-T-C-G-G-G-G-T-T-C-(:-T-C-T-C-(~-A-~;-T-T-G-(,-~;-i~-C-A-~;-(;-~;-A-~~-~:-',-~'
26
5'
c
.
G .
c
.
C-T-GIG-A-G
(:
i:
.
ill
+
G
16117
G
A-A-C-T-C-G
. G . . .I. 10111
C
C-G-A-G-T-T
C
The sec4uencr:of’ bovine I .706 and 1.72Ol1satellites and that of tht. wlwtitiw l)art of the 1.71 I a satellite appeared to evolve from a common ancvst,ral secluenc:~~(Pwh d nl.. I!Wo: I’Kschl & Streeck. 1980; Streeck. 1981). The extent of homology between the 1.715 average srcpw:r and the src4uenw ot’ each strand of the 1.720b satellite amounts to 55?;j (Fig. 7(a)). It also reveals ii 55”,, homology to the 16 t)p protot’ype sequence of the 1.706 satellite segment K. and a 489, homology to the 23 l)p prototype sequence of the I.706 satellite SUII segnwnts (Fig. 7(l))). It is worth noticing that when the related I.706 and 1.72Ob s~~yut~nws are aligned to obt,ain maximal homology with the 1.715 average sec4uenc:e.thry also
1
BOVINE l-720 (0 1
S strand
(b)
I.720
F strand
l-706
Pvu 46 bp 5’
5’
TATCAAGGAGATGAGCAGGCAGGAATCACGCAGCTCAGCTGGCAAT 17i31 ...... . . .. . . . . ACTCGGGGTTCCTCTCGAGTTGCGGCAGGGAACTCGGGGTTCCTCTCGAGTTGCGG 15131 .. ... . . . . . . GATCACGTGACTGATCATGCACTGATCACGTGACTGATCATGCACTG
1.715 I.706
Suu 23 bp 5’
31 I
I
I.715 satellite 31 bp sequences
1.706, I.71 lo I.720 23 bp sequences
301
DSA
CTCATCTGCCTGATACTCGCCAGCTGAGCCGCGCGACACCTGCCCG 17 131 . . ... . .. .. . . GGGAACTCGGGGTTCCTCTCGAGTTGCGGCAGG~CTCGGGGTTCCTCTCGAGTT 17131 . .. .. .. .. . . ,...., TATCAGGCAGATGAGCGGGCAGGTGTCGCGCGGCTCAGCTGGCGAG
5’ 5’
I.71 5
1.715 SATELLITE
31 I
fa%
l
23 I
23
I
23
I
23
I
23
FIN:. 7. Homologies between the bovine I.715 satellite DNA 31 bp average sequence and sequences of the bovine 1.706 and 1.720b satellite DNAs. (a) Comparison of the 31 bp average sequence with the sequences of the 1.720b satellite F and S strand (P&h1 & Streeck, 1980). (b) Comparison of the 31 bp average sequence with the 1.706 satellite segment Sau 23 bp prototype sequence (Pech et al., 1978) and with the 46 bp prototype sequence of the 1.706 satellite segment B (Piischl & Streeck, 1980). All sequences are aligned to obtain maximum homology with the 31 bp average sequence. Identical nucleotides at particular positions in the compared sequences are marked with dots. Hyphens have been omitted for clarity. (c) Schematic presentation of the relationship between the bovine 1.715 satellite 31 bp average sequence and the 1.706. 1.72Ob and 1.71la satellite 23 bp long repeated units. The 31 bp unit is considered as derived from one 23 bp unit and an 8 bp fragment of an adjacent 23 bp unit. The hatched region indicates the area of maximal homology.
show maximal homology with each other. The region homologous to the 31 bp 1,715 average sequence in sequences based on 23 bp periodicity consists of one 23 bp unit and of eight consecutive nucleotides derived from the adjacent one (Fig. 7(b). and (c)). On the other hand, both the 31 bp average sequence and the 23 bp ancestral sequences contain related undeca- and dodecanucleotides. These data suggest t’hat sequences of the bovine 1.706, 1.711a, 1.720b and 1.715 satellites may have all developed from the same pool of related sequences.
4. Discussion The analysis of the relative arrangements of different sequence variants of the short-range repeat unit suggests particular mechanisms responsible for the formation of internally repetitive long-range repeated unit. Such an analysis, performed by Pech et al. (1979~) for bovine 1.706 satellite DNA and by Streeck II
302
.A. P.kI:1’IESNIcZAK
ET
AL
(1981) for bovine l.Tlla satellite DKA, led them to propose schemes for thr generation of long-range repeated units of these satellites. In the case of the 1399 bp repeated unit of the bovine 1.715 satellite. the degree of homology between individual 31 bp sequences is high enough for an unambiguous st,atement of its 31 bp internal periodicity. On the other hand. any subtle analysis is bound t,o fail because of the high background of divergence. The only feature suggesting intermediate steps in the course of formation of the 1399 bp unit is the existence of two adjacent. approximately 460 bp long sequences with a relat,ively high degree of homology. Thus. providing that the ancestral sequence for the I.715 satellite was 31 bp long, one should expect at, least two amplification steps in the formation of the 1399 bp unit.. The first gave rise to an approximately 950 bp long array of copies of the 31 bp ancestral sequence. In the second step. the duplication of a 460 bp long region of the array took place. Assuming this scheme t,o be adequate. the intervals between subsequent, amplifications should be long enough to int~rodut~e t,he observed high degree of divergence. An accurate estimation of these intervals is difficult’ in view of two facts: Pech et (11.(1979a) observed different mutation rates at various positions of the 23 bp repeated units of the bovine I.706 satellite DSA: different mutation rates. depending on t’he position in the long-range repeated unit. also became evident after the comparison of the 1399 bp repeated sequence of the I.715 satellite with the homologous sequence present in the t)ovine 1.‘il 1b satellite (Sreeck, unpublished results). The extent of homology between these t,wo sequences varies from 97 to X700 when calculated for non-overlapping, 200 bp long regions. Different’ mutation rates along the 1.715 satellite 1399 bp ancestral sequence may be responsible for the observed interspersion of sequences showing a high degree of homology to the 31 bp average sequence with t,he more diverged ones. The sequence divergence along 31 bp sequences showing at least 50(& hornolog) to the 31 bp average sequence varies for different’ regions of the period. and is least between positions 13 and 19 (Fig. 4). The plot of nucleotide frequency for each position of the 3 1 bp period reflects the extent of divrrgrncae that was introduced into the array of31 bp units before amplification leading to the formation of ho\-inr 1.715 satellite I)SA4. C)ne can ask whether the regions t,hat, have been the most variable before the amplification remain also the least stable after the final multiplication of the 1399 bp unit. Some insight into this problem may be given b> the analysis of the experimental data presented by Roizes Pt ~1. (1980). performed in terms of the variable sequence divergence along the 31 bp unit’ discussed above. We made an attempt t,o find the correlation between the sryuencc divergence at given positions of 31 bp units and the different mutation rates observed for bovine 1.715 satellite DNA by means of wst,riction rnzynw analysis. Roizes rt nl. (1980) found different mutation r&es for two I’stl recognitSioll sequences present in 1399 bp repeated units. According to their data. t*he rate of mutation for the cleavage site localized between positions 87 and 92 is greater than for t’hat, at, positions 784 to 789 in Figure 1. Positions 87 to 92 and 7X4 to 789 of the 1399 bp repeat)ed unit correspond to positions 23 to 2X and 1%t,o 17. respectively. of’ the average sequence (compare Figs 1. 3 and 4). The sequence divergence in the
BOVINE
1.715 SATELLITE
DNA
:w:I
region 23 to 28 within the 31 bp period is greater than that’ for positions 12 to 17 (Fig. 4). Thus, the I’stI recognition sequence with the higher rate of mutation is localized in the more variable region of the 31 bp unit. The conclusion that less stable regions of the 1399 bp repeated unit correspond to the most diverged regions of 31 bp units, however. is not supported by the results of a similar analysis made for three sequences containing the tetranucleotide G-A-T-C’ Mb01 and SauBAI recognition sequences that are present in the 1399 bp repeated unit. The last T residue in the GTG-A-T-C-T sequence (positions 154 to 159: Fig. 1) occupies position 3 of the 31 bp unit. The sequence divergence at this position is rather high, in spite of the fact that the mutation leading to generation of the Ba,mHI site does not take place (Furtak, Klysik, Pticienniczak & Bartnik, unpublished results). Position 3 is also occupied by guanine, which follows the next M/ml site (positions 771 to 774, Fig. 1). In the preparations of bovine 1.715 satellite DNA isolated from calf thymus, the cytosine residue at position 774 is completely methylated, blocking Sau3AI cleavage of this sequence. Every base change at position 775 should lead to the activation of a SauSAI cleavage site. but such a mutation is not detectable. Thus, the situation is similar to that described above for the first MboI site. Another case is the situation for the hexanucleotide A-G-A-T-C-A (positions 1152 to 1157). The A to T base change at position 1152 is necessar,v to generate the rltuC1 recognition sequence. Such a mutation occurs frequent]? (Roizes et al.. 1980), despite the fact that the above-mentioned adenine residue in the 31 bp unit occupies position 18. which is more stable than position 3. Therefore, there is no obvious correlation between the extent of divergence at, different positions of the 31 bp unit and mutations introduced after the amplification of the ancestral 1399 bp unit. Because of that, we have to consider the 1399 bp repeated unit, and not the 31 bp one, as a unit of mutational processes in bovine 1.715 satellite DNA. We are very much indebted to Dr R. E. Streeck for encouragement and communicating his unpublished data. We thank Dr D. Brutlag for reading the manuscript, Dr E. Bartnik for the gift of polynucleotide kinase, and Dr H. Pluciennik, in whose laboratory [ 32P]ATP was prepared. We are also grateful to Dr K. Furtak. who communicated the results of restriction analysis of the bovine I.715 satellite DNA, to Dr H. Panusz for his continuous interest in this work. and to Mrs K. Siewierska for her skilful technical assistance. This work was supported within the project 09.7.2.3.2. by the Polish Academy of Sciences.
REFERENCES Appels. R. & Peacock, W. J. (1978). Int. Rev. Cytol. 8. 7&126. Bickle, T. A., Pirrotta, V. & Imber, R. (1977). Xucl. Acids Res. 4. 343-353. Brutlag, D. L. (1980). Anne. Rev. Genet. 14, 121~217. Carlson, M. & Brutlag, D. (1979). J. Mol. Biol. 135, 483-500. Filipski, tJ.. Thiery, J. P. & Bernardi, G. (1973). J. Mol. Biol. 80, 177-197. Gray, (‘. P., Sommer, R., Polke, C.. Beck, E. & Schaller. H. (1978). Proc. &at. Acad. Sci.. r’.*s.A. 75, .50-53. John, B. 8: Miklos, G. L. U. (1979). Int. Rev. Cytol. 58, l-114. Kamp. D., Kahmann, R., Zipser, D. & Roberts, R. ,J. (1977). Mol. Oen. Tenet. 154, 231-240. Kirk, G. T. (1967). J. Mol. Biol. 28, 171 172.
304
A. PEL’CIENNICZAK
ET
AL.
Klysik, J., Furtak, K., Skowronski, J., Plucienniczak. A. & Panusz, H. (1979). BuIl. dcud. Pot. Sci. ser. Biol. Cl. II? 27, 87-91. Macaya, G., Cortadas, J. & Bernardi, G. (1978). Eur. J. Biochem. 84. 179-188. Malarska, K., Plucienniczak, A. & Skowronski. J. (1979). Biochim. Biophys. Acta. 561. 3% 333. 65, 499-560. Maxam, A. M. & Gilbert. W. (1980). Methods Eruymol. Ohmori, H., Tomizawa, ?J. & Maxam, A. M. (1978). ,Vucl. Acids R~s. 5. l-179--1485. Pech, M., Streeck, R. E. 8r Zachau, H. G. (1979a). Cell, 18. 8833893. Pech, M., Igo-Kemenes, T. & Zachau, H. G. (19796). Sucl. Acids Rcs. 7. 417-432. Pijschl, E. & Streeck, R. E. (1980). J. Mol. Biol. 143. 147--154. Roizes, G. (1976). iVuc1. Acids Res. 3, 2677-2696. Roizes, G., Pages, M. & Lecou, Ch. (1980). Nucl. Acids firs. 8, 3779-3792. Rosenberg, H., Singer, M. bi Rosenberg, M. (1978). Sciencr, 200, 394-402. Skowronski, J., Furtak, K., Klysik, J., Panusz, H. Br Phicienniczak. A. (1978). Sucl. =tr:ids Res. 5, 4077-4085. Sneider. W. T. (1980). Nucl. Acids Res. 8, 3829-3840. Streeck, R. E. (1981). Science. 213, 443-445. Streeck, R. E. & Zachau. H. G. (1978). Eur. J. Biochem. 89, 267-279. Sussenbach, J. S.. Steenbergh, P. H., Rost, J. A., van Leeuven. W. J. & van Embden. J. D. A. (1978). Nucl. Acids Res. 5, 1153-l 163. .-I&s Rrs. 9. It%IX1 Thayer. R. E., Singer, M. F. & McCuthan, T. F. (1981). Sucl. Edited by S. Hrwrlrr