J. Mol. Biol. (1972) 157, 453-471
A New Family of Interspersed Repetitive DNA Sequences in the Mouse Genome WOLFGANG GEBHARD, THOMAS MEITINGER, JOSEF HOCHTL AND HANS G. ZACHAU
Physikalische
Institut fiir Physiologische Chemie Biochemie und Zellbiologie der Universitiit
(Received 11 January
1982, and in revised form
Miinchen, GFR
9 February 1982)
Repetitive DNA sequences near immunoglobulin genes in the mouse genome (Steinmetz et al., 1980a,b) were characterized by restriction mapping and hybridization. Six sequences were determined that turned out to belong to a new family of dispersed repetitive DNA. From the sequences, which are called Rl to R6, a 475 base-pair consensus sequence was derived. The R family is clearly distinct from the mouse Bl family (Krayev et al., 1980). According to saturation hybridization experiments, there are about 100,000 R sequences per haploid genome, and they are probably distributed throughout the genome. The individual R sequences have an average divergence from the consensus sequence of 12.5%, which is largely due to point mutations and, among those, to transitions. Some R sequences are severly truncated. The R sequences extend into A-rich sequences and are flanked by short direct repeats. Also, two large insertions in the R2 sequence are flanked by direct repeats. In the neighbourhood of and within R sequences, stretches of DNA have been identified that are homologous to parts of small nuclear RNA sequences. Mouse satellite DNA-like sequences and members of the Bl family were also found in close proximity to the R sequences. The dispersion of R sequences within the mouse genome may be a consequence of transposition events. The possible role of the R sequences in recombination and/or gene conversion processes is discussed.
1. Introduction DNA sequences, which constitute a major part of the mammalian genome, have been widely investigated in recent years (for reviews, see Jagadeeswaran et al., 1981; Singer, 1982). In the human genome there are about 300,000 copies of the so-called Alu family (Deininger & Schmid, 1976 ; Houck et al., 1979). This repetitive DNA was prepared by renaturation of heated DNA and subsequent treatment with S, nuclease. Sequences were determined in the 300 bpt long DNA fragments obtained by this procedure (Rubin et al., 1980), in clones derived from such material (Deininger et al., 1981), and in specific gene-containing fragments (Baralle et al., 1980; Bell et al., 1980). Sequences homologous to the Alu Dispersed
repetitive
t Abbreviations
used : bp, base-pairs; kb, lo3 base-pairs. 453
OW2-2836/82/150453-19
$03.00/O
6 1982 Academic Press Inc. (London) Ltd.
454
W.GEBHARD
ET
AL.
sequences were found in the DNA of African green monkey cells (Dhruva et al.. 1980; Grimaldi et al., 1981), Chinese hamster ovary cells (Haynes et al., 1981) and mouse cells (Kramerov et al., 1979; Krayev et al., 1980). The Alu-like sequences of the mouse, which have been termed the Bl family. occur in 40,000 to 80,000 copies in the mouse genome. A member of the Bl family was recently found next to a @globin gene and has been sequenced (Coggins it al.. 1982). In addition to the sequenced repetitive elements of the Bl family and a repetitive element near a gene for a transplantation antigen (St+nmetz et al.. 1981). further repetitive elements have been found in the mouse genome but have not been sequenced (Manuelidis, 1980; Cheng & Schildkraut, 1980; Heller & Arnheim, 1980; Brown & Dover, 1981). Recently, an evolutionarily conserved repetitive sequence that had been overlooked up to now, was also detected in mouse DNA (Miesfeld et al., 1981). Moreover, repetitive sequences are located near the genes for ,?I-globin (Ha&wood et al., 1981), ribosomal RNA (Kominami et al.. 1981) and immunoglobulins (Steinmetz et ai., 1980b ; Arnheim et al., 1980). In a comprehensive electron microscopic study of mouse DNA, about 300,000 palindromic sequences per genome were detected, many of which may be dispersed repetitive elements (Biezunski, 1981). In this paper, we report the sequences of several dispersed repetitive elements of the mouse genome. In previous work at our laboratory, these sequences have been localized near immunoglobulin genes (Steinmetz et al., 1980a,b).
2. Materials and Methods Methods, as well as most materials, were as described by Steinmetz & Zachau (1980) ; blot, hybridization was as in Fig. 3 of that paper. Experiments involving isolation and propagation of recombinant plasmids were carried out under L2/Bl conditions in accordance with the German biosafety guidelines. DNA sequencing was done by the chemical method of Maxam & Gilbert (1980). Calf intestinal phosphatase, phage T4 polynucleotide kinase and Hpall came from Boehringer Mannheim, DdeI, HphI and RsuI were from Biolabs, and XbaI was from Bethesda Research Laboratories. DNA ligase was a gift from R. E. Streeck. Restriction nuclease digests were carried out in IOmM-Tris.HCl (pH 7.5), 50 mM-Nacl. 8 mM-MgCl,, 1 mM-dithiothreitol (for AU, BgIII, BamHI, BspRI, Ec.oRI, HhuI, HinfI, MhoII and SstI), in 6 mi%%-Tris.HCl (pH 75), 60 mM-NaC1, 15 mm-MgCl,, 6 mM-%mercaptoethanol (for L)del, RsuI, Sau3A and Sau96) or in the same buffer without NaCl (for HpaII and HindIII). Cleavage by HphI was done in 10 mw-Tris.HCl (pH 7.4), 10 mM-MgCl,, 50 mM-KCl, 05 mMdithiothreitol, as specified by the manufacturer. Liver DNA from Balb/c mice was prepared from nuclei according to the method of GrossBellard et al. (1973), with slight modifications. DNA from chicken blood and calf thymus DNA came from Calbiochem and Boehringer Mannheim, respectively. DNA of African green monkey (CV-1 cells), rat liver (Sprague-Dawley) and yeast (wild-type) were gifts from F. Fittler, T. Igo-Kemenes and W. Hijrz, respectively. The clone of mouse satellite DNA in pBR322, comprising about 50 repeat units, was a gift from P. Schulz. Computer programs for storing and handling sequence data, as well as the programs for the simultaneous search for homologies in 4 sequences, were as used by Neumaier (1981). Dot matrix comparison programs, similar to that used by Efstratiadis et al. (1980), were used in screening for complete homologies of 3 to 6 nucleotides and for 7 out of 10 homologies. The graphical analysis of the base distribution was as described by Zachau et al. (1982).
MOUSE INTERSPERSED
REPETITIVE
45.5
DNA
3. Results (a) TWO families
of dispersed
repetitive
DNA sequences Cocated near ~rnrnun~~obu~~n genes
A number of immunoglobulin gene-containing DNA fragments that were isolated in our laboratory are compiled in Figure 1. All the fragments also contain repetitive DNA sequences, which were first detected by blot hybridization with sonicated, unfractionated mouse DNA. Subclones containing the repetitive sequences were prepared from some of the fragments and sequenced. The sequences that hybridized with a BamHI subclone (see below) of fragment f-T were termed R sequences. They define the R family of repetitive DNA sequences. The subclone derived from fragment f-T contains the repetitive sequences Rl and R2, while sequences R3 to R6 are found on other fragments. Further R sequences were detected on several fragments (Fig. 1) but not sequenced. Some fragments also contain repetitive DNA sequences of the Bl type (Fig. l), as was shown with the help of an authentic clone (Kramerov et al., 1979; Krayev et al., 1980). Members of the R family and the Bl family do not cross-hybridize. Since all hybridization experiments in this work were carried out in the presence of salmon sperm DNA, we would not have detected the evolutionary conserved family of repetitive sequences described by Miesfeld et al. (1981).
(b) The repetitive
DNA
sequences RI and R2 are arranged
as inverted
repeats
Fragment f-T is the only one described here in which two neighbouring repetitive elements, Rl and R2 (Fig. l), were sequenced. The respective subclone was characterized by restriction mapping and blot hybridization with unfractionated mouse liver DNA (Fig. 2(a)). Subsequently, the subcloned fragment was fully sequenced (Figs 2(b) and 3). For more than 950/ of its length, the sequence rests on data from both strands. The repetitive sequences Rl (1 to 468) and R2 (637 to 1191), which are separated by an A+T-rich region (167 bp), are arranged as incomplete inverted repeats: 328 bp each of the two R sequences are palindromic (underlined in Fig. 3), while the sequences between the underlined regions differ in Rl and R2 with respect to both composition and length. R2 is 87 bp longer than Rl, mainly because of two unrelated DNA stretches in R2 (centred around positions 730 and 920 in Fig. 3). Fragments from the R2 region did not hybridize with mouse liver DNA under our conditions of blot hybridization (see the legend to Fig. 2(a)). This can be explained by the fact that R2, although clearly a member of the R family of repetitive sequences, differs considerably in its sequence from the consensus sequence (see below).
(c) Sequences of the repetitive
elements R3 to R6
Subclones containing the four repetitive DNA sequences R3 to R6 were prepared from the respective fragments (Fig. 1), and characterized by restriction mapping and sequencing (Fig. 4). The sequences are shown in two Figures ; the repetitive
EM BaS -:II:::L
L4 (.
,_.__ ___,
I .,
R4b L5
d--J VJ Lp
s H
VJ Ly
+ _____( Mb I’
s HH x ”
’
HV
in
HBa
II
”
,y 1... . ,
H : : : :
: : : :
:i
:: :: : :
BXBa : : : :
fl ixia I ,(,I l--.--4) .. ., +-, 614 4R6
XHSHH
iiiiii
H
:?&
Tl
; 5%
T2
L7
1
-L--.-4
p-..+
L9
C U
C u
-
-
L6
)............,
rylya, +--,
+,
BaS Ba
+--,
I.._-__(
s
v Ll
sesssa
L8
k---,
FIG. 1. Localization of repetitive DNA sequences on immunoglobulin gene-containing fragments. EcoRI fragments of Bslb/c mouse liver DNA (Ll to LQ) and DNA from myeloma T (Tl, T2, and f-T) are shown. Most fragments have been isolated and characterized previously: Ll , Tl, T2 (Steinmetz & Zachau, 1980); L2 to L5 (&einmetz et al., 198Ob); L6 (Steinmetz et al., 19806; Pech ef al., 1981): L7 (Pech et al., 1981); L8 (Hiichtl et al., 1982); f-T, previously designated T3 (Steinmetz el al., 198Oa; Hiichtl et al.. 1982). Ll to L8 containing immunoglobulin gene segments in the germline configuration, while rearranged genes are found on Tl and T2. f-T consists of linked flanking sequences of variable (V) and joining (J) gene segments. Constant gene segments are designated C. The 7.3 kb fragment L9 is related t,o the flanking sequences of L8 (Fig. 3 of HBchtl el al., 1982) and probably also contains a V gene segment. Only those restriction sites are shown that are relevant for the subcloning (Figs 2 and 4) and for hybridization experiments: H, HindIII: Ra. BumHI; B. BgIII: S. WI: X, XbaI. Blot hybridizations were carried out with sonicated mouse liver DNA (2 pgg/ml ; 4 x 10’ cts/min pei rg; long-waved lines for strongly hybridizing fragments and short-waved lines for weakly hybridizing fragments), with the Rl/R2-containing fragment (80 rig/ml; 3 x 10’ rts/min per pg; broken lines). and with the Bl sequence containing clone Mm31 described by Kramerov el al. (1979) (30 rig/ml; 2 x lo* cts/min per pg; dotted lines). The locations of the sequenced repetitive elements Rl to I16 are indicated by triangles, which also designate the relative direction of the sequence within the subclones. The orientation of the subclones relative to the immunoglobulin gene segments is based on restriction mapping data for the Rl, R2, R5 and R&containing fragments with the first 2 assignments still being tentative. Since the V gene segments on the R3 and R4-containing fragments have not been sequenced (broken boxes), it cannot be decided whether the repetitive sequences are located upstream or downstream.
Sst I
Bum HI
HinfI
YtY
b
bp S-
S-
4870, 4360 1%L ‘1;: = 890 685 -
685357 290 -
510 250 240 -
t
I
I
_
+--25’--
I --------I241
685
;
I
I
I
290
t
Ip P
I
I
I
I
357
II. III
I
=
(b)
’ 8 I
L.................................... I I
I I I I I I
l
R2
RI
L I I I “P I I
, I I /
+
‘W-4 ~1 ;
+.................*........................j
I
II IV
/ /
I I I I I Sou96
vi=====, I
I
, I--?
I
1~ BspRI I
1
HinfI
SSlI I I / i I )-------4 I ) / DdeI , i , I I I I ; ; MboII: ---c--(1 I II: , FIG. 2. Characterization of the Rl and RP-containing fragment by restriction mapping, blot hybridization and sequencing. A BamHI fragment of fragment f-T (Fig. 1) was subcloned in pBR322 (Rl/R2containing subclone). In the digestion experiments shown in (a) I to V, the whole subclone was used; in (a) VI and (b) a large HhaI fragment isolated by isokinetic sucrose gradient-centrifugation as shown in Fig. 1 of Altenburger et al. (1980). (a) Restriction nuclease digests of 95 pg DNA each were electrophoresed on 2”/o (w/v) agarose gels. Ethidium bromide-stained lanes (left) are compared with the corresponding autoradiograms (right) obtained after hybridization with mouse liver DNA as in Fig. 1. A summary of the hybridizing fragments is given underneath the gel pictures. The differences in blot hybridization between the Rl and R%-derived fragments are clearly seen in the SslI digest of the HhaI fragment: the 299 bp fragment derived from Rl hybridizes, while the 375 bp fragment derived from R2 does not (lane VI). The BumHI-SstI digest of the subclone did not allow such a distinction, since the 251 and 241 bp fragments (broken lines; lengths are always counted without sticky ends) are indistinguishable in the blot (lane III). (b) Partial restriction map and sequencing strategy (thin arrows) of the Rl and R2-containing fragment. Small vertical lines indicate nuclease cleavage sites, The arrows represent only the sequences that have been actually used. The dotted thick arrows designate the repetitive elements Rl and R2 as defined by the sequences (Fig. 3). b
458
W. GEBHARD
ET
AL.
~ATCCATCiC*TAATCnGCc*cTAAAcc~*AAcAcTATiTATTCCAGTAAGATTTiGCTGAAAGGACACTGATATAGCTGTCTCT~ CCTAGGTAGAGTATTAGTCGGTGATTtGGGTTTGTGATAACGTATAAGGTCATTCTAAAAC~CTTTCCTGTGACTATATCGACAGAGAA ----GTGAGACTAiGCCAATGCCiGGCAATACiTTAGTOOATiXTCA ---CACTCTGATACGGTTACGGACCGTlTATGTAAT~ACCTAC~6TGTCAGTTGATMCCT~~6T6~CC~TTACCTCCTC6ATC ----
-ssr
-
AGAAAGTACCCAAGGGGCToAA~GGTCTCiCAACCCTAT~~T~A~A~CAATATGAA~TAACCAGTA~CCCCAGA~TCGTGTCTCTA --TCTTTCATGGGTTCCCCGACTTCCCCAGACGTTGGGATATC~CCTTGTTGTTATACTTGATTG~~ATGG~TCTCGAGCACAGAGA~ ----------
270
--
GCTGCATATI~TAGTAGAAG~TGGCCTAGT~GCCATC AGAGAGGCCCCTA6GCCTCTCTT;AiAfACCCCAGi CGACGTATACATCATCTTCTACC~TCADCMibTAGTAAC~TTCTCTCC~~AT~Q~GAA~TATAT6~QTCATGTCCCC~G -------GCCAG6GCTiAGAAGTGocAGTG~TGGG~AG66AAQCA~GGCAGG~G~GGGTATAGG~ACTTTCAG~ATAGCATT --QCL-L-, CGGTCCCGATTCTTCACCCTCACCCACCCATCCCTTCGTCCCGTCC~CTC~ATATCCCTTGAAAGTCCTATCGTAAACTTTACATTT -v---
GAAATGTAAAT -319
AAATAAAGAAAA?i?%mTAiTAATTTATAiTAATAATAAiAAGAATAATA TTTCATAGATTATTTTTTCTTTTTTTCTTTACATTTATTTCTTTTATAGGTTATTTTATTATTAAATATTATTATTATTGTTCTTATTAT --GAAACATCTiTATAAGAATAGTTGAAGATiTTTCTTTCTC CTTTGTAGAAATATTCTTATCAACTTCTAAAAAGAAAGACTTCCTTCTTCTTAATCTGAT~TGTTTGTCGTTTTCTACTCTGGGTCGTA T*TCiGTTTiTCAT~~GATiTTTTACATliTT*TATTTCi ATAGACAAAAAGTAATCTATAAAATGTATAAATATAAAGTTTACAATAGG~TAGGAGTCCAA~AT~~G~GGTCTTTGGGTGATA --CCTATGTCTkTCAGCCTGiTTCTATTAGtiGTTTGTCCC -=Tr-CCCAACAAcACATTCCATcCTCCCTCCCCTCACTTTCTC --GGATACAGAGGAGTCGGACGAAGATAATCCCAAACAGGGGTGGGTTGTTGTGTAAGGTAGGAGGGIGGGGAGTGAGTATATGTCCC -----------900 GCATCCAGCCTTCACAGGACCAAGGGCAT~TTCTCCCAT~AAAGCCCAT~AAGGCCATC~TCTGTTATA~ATAAGTCTG~ATCCATGGGT
GGGAGGTCCCCATGAGAAACCATCAACTAAATCTGGGACCCTCGAGACCAACCAACTATCA~AAAAAGAAQGATACTCCAACGTTTTG~ ------TTCAGCTCCiTAAGTCCTTiCTCTAACTC~CCCATTAGG~ACTGCTTGC~CAGTCCAAT~GTTGGCTGT~AGCATCTGC~TCTGTATTTG AAGTCGAGGAATTCAGGAAAGAGATTGAGffiGGTAATCCTTGACGAACGAGTCAGGTTACCAACCGACACTCGTAGACGGAGACATAAAC --------
-
TTAGCTCTGcAAGAGCTTC~CAGGAGACAOCTATAfC AAT~AGACCTTCTCGAAGAGTCCTCTGTCGATATAGTCTGAGGACAGTCGTACATGAAGAAATGTAGGTGTTATCACTCACCCAAACCA -------GTCTGTATGi66GATGGi%--CAGACATACACCCTACCTAGG ---
FIG. 3. Sequence of the Rl and m-containing fragment.. Both strands of the DXA are written in order to show the palindromic character of the arrangement of Rl and R2 by underlining the sequences that are identical in the 2 elements. An homology between positions 394 to 401 and 736 to 743 was neglected. Underlining with the mark a in positions 21 to 41 signifies an homology to mouse 4.5 S RNA (see the text), Superscript arrows disignate : (1) an inverted repeat centered around positions 324 to 327 ; (2) long direct repeats starting at positions 440 and 477 (arrows interrupted at non-homologous positions); (3) and (4) 2 pairs of direct repeats (starting at positions 706,760, and positions 904,938) flanking stretches of DNA of unknown origin (positions 707 to 760 and 901 t,o 934; see the text). The hyphens have been omitted for clarity.
DNA sequences themselves are shown in Figure 5 and the flanking sequences are shown in Figure 6. It is difficult at present to define exactly the borders of the repetitive elements. At the 3’ side they extend into A + T-rich sequences, which. of course, also show some homology (see below). At. the 5’ side, the border is defined mainly by the fact that adjacent fragments do not hybridize with mouse liver DNA (Fig. 1) and by the occurrence of short flanking direct repeats in the case of R4 (see below). In the RS-containing fragment, 1587 bp have been sequenced. Here, the nonrepetitive flank (Fig. 6) extends until position 262 (Fig. 5), leaving a repetitive DNA sequence of only 216 bp. In all, 1070 bp of the R4-containing subclone were sequenced ; that is, the 473 bp long repetitive element and a few hundred base-pairs on both sides. An 876 bp long sequence was determined for the RS-containing fragment, with 401 bp for the repetitive element itself, and again a few hundred base-pairs on both sides. As in the case of R3, the repetitive DNA region of R5 is shorter, only starting at position 71 (Fig. 5). In the R6-containing fragment, t,he
MOUSE
INTERSPERSED
REPETITIVE
DNA
I I
. . . . . . . . . . . . . ...*
R3
------
---_---
---+-
__------
Fig. 6
I I I
I
1 I I
I
ä -------
----
Fig. 5
----------
-------------
I
I
I
- ----,
Fig. 6
I I
I’
I I
I
I
I
I
, I
I
I
I
I
l
f Hi’fI
i Sau96
-1
‘aI
[ SstI
I
I
I I I
I
I I I I
I ’ I
I I
f DdeI
I
1
II HpalI
------i-
I HpaI
-----
----+---------7---------*-----------*
Fig. 6
Fig. 5
Fig. 6
FIG. 4. Partial restriction maps and sequencing strategies of R3 to Rti-containing fragments, The R3 to RB-containing fragments (Fig. 1) were subcloned in pBR322: (a) a 1.4 kb BgIII-EcoRI fragment for R3, (b) a 2.3 kb EcoRI-BarnHI fragment for R4, (c) a 1.6 kb EcoRI-Hind111 fragment for R.5, (d) a 53 kb HindIII-EeoRI fragment for R6, from which a further subeione (1.0 kb BgtII-BumHI) was derived on the basis of blot hybridization experiments with the Rl/R2containing subclone (Fig. 2: 4 x 10s cts/min per pg ; 4 rig/ml). Arrows and small vertical lines have the same meaning as in Fig. 2(b) The arrows at the bottom of (a) to (d) indicate the reading directions and the Figures where the sequences are shown.
l-
--___-----__-------_____
600
--------=----
.
i
,i UI i
i
:
-Sou3A
-EspRI
- AluI
1400
900-
E
ALuI
2400
800 --,41U1
,BspRI
=EspRI
A&J1 ,HpoiZ 700-=Ai~I
600 --aA
- AhI
-BspRI
500--BwRI
400-
300 -
200-=MboII
IOO-
bp
----------------~---~----II cE
MOUSE
INTERSPERSED
REPETITIVE
461
DSA
90 GGATCCATCiCATAATCAG~CACTAAACCCAAACACTATiGCATATTCC~GTAAGATTT~GCTGAAAGG~CACTGATAT~GCTGTC GGATCCATCCCACATACAGACACCAAACCCACTCACTATTGTGGATGTAAAGAAGTACATGCTGACAGGAGTCTGATATAGCTGTC
’
-----_-__----_-_________________________--------------------------------------------------
GGATCCATCCCATAATCAGCTTCCAAACGACGACACCATTGCATACACTAACAAGATTTTGCTGAAAGGACCCAAATATAGCTGTC ----------------------------------------------------------------------TTTTGACA G TCT TAGT GGATCCTGACATAGCTCTCTCTT GGATCCATCCCATAATCAGCCACCAAACCCANACACTATTGCATATNCNAANAAGATTTTGCTGAAAGGANNCTGATATAGCTGTC -----
--
--160
TCTTGTGAGACTATGCCAAiGCCTGGCAAATACATTAGTPATGGA TCCTGAGAAGCTCTTCCAGAG CTAACAAATACAGAGGCAGATGCTCACAGCCAACCATTGGACTGAGCAAGCAG
iCACAGGGCCtCCAATGGAG(; TTCCTAATGGGGG
TCTTGTGAGACTATGCCGGGGCCTAGCAAACGCAGAAGTGGATGCTCACAGTCAGCTATT~ATGGGTCACAATG ACTTGTGAGACTATGCCAGGGCCTAGCAAACACAGAAGTGGATGCTCACAGTCAGCTATTGGATGGGTCACAC TCTCGTGAGGCTATGCCAGTGCCTGGCAAATACAGAAGTGGATGCTCACAGTCATCTATTGGATGGG
CCCCCAATAGAGG GGCCCCCAATGGAGG ACACAGGGCCCCTAATGAAGG
TCTTGTGAGACTATGCCAGNGCCTAGCAAATACAGAAGTGGATGCTCACAGTCANCTATTGGATGGGTCACACAGGGCCCCCAATGGAGG -----
--
a
AGCTAGAGAiAGTACCCAAGGGGCTGAAGGGGTCTGCAACCCTATAGGT~GAACAACAAiATGAACTAA~CAGTA AGTTAGAGAAAGGACTTAAGGAGCTGAAGGGTTTTGCAACCTCATAGGAAGAAAAACACTATCAACCAACCAGAG ----------------------------------------------------------------------------------GAG AGCTAGAGAAATTACCCAATGAGCTAAAAAGGAACTGCAACCCTATAGGTGAAACAACAATATGAACTAACCAGTA AGCTAGAGAAAGTACCCAAGGAACTAACTTCAACTTCAACCCTATAGATGGAACAACAATATGAACTAACCAATA AACTAGAGAAAGTATCCAAGGAGCTAAA~GGTCTGCAACTCTATAGGAGGACTAACAATATGAACTAACCAGTAACCCCCAGAG AGCTAGAGAAAGTACCCAAGGAGCTAAAGGGNTCTGCAACCCTATAGGTGGAACAACAATATGAACTAACCA6TA _--------GTGTCTCTAkTGCATATGiAGTAGAAGAiGGCCTAGTCk ATGGATTCAGACTTATATATAACAGAAGATGGCCTTCATG TGACTCTAGCTGCATATGTATCAAAAGATGGACTAGTCG TTGACTCTAGCTGCATATGTATCAAAA6ATAACCTAGTCG TTGTCTCTAtXTGCATATGTATCAAAAGATG6CCTAGTCG GTGTCTCTAGTTGCATATGTAGCAAAAGATGGCCTAATAG NTGTCTCTAGCTGCATATGTATCAAAAGATGGCCTAGTCG ----_
CTC 360 GCCATCiTTGGGAAGdAGGCCCCTA6G CCTCT ‘CTTTATATAC GGCTTTAATGGGAGAAGATGCCC TTGGTCCTGTGAAGGCTGGATGC GCCATCACT66AAAGAGAGGCCCATTGGACTTGCAAAACTTTATATGC GCCATCACT66AAAGAGA66CCCATTGGACAT66AAACTTTATATGC GCCATCACTG6AAAGAGAAGCCCATTG6ACTTGCAAACTTTATATGC TCGGCCACCAATGAGAGGAGAGGCCC TTGGTTTTTTGAAGATCATATGC
-
CCCCCAGAG --
GCCATCACTGGGAAGAGAGGCCCATTGGACTTGTAAACTTTATATGC ------_ ---
CCCAGTACAiGGGAACGCCiGGGCTAnGni GTGGGAGiGGGTGGGTAiGG CCCTGTATATGAGAAAGTGAGG6GAGG6AG GATGGAATGTGTTGTTGGG CCCAGTACAG6GGAACACCADOGCCAAAAA 6TGAGAATG66TG66TAG66G CCCAGTACAGG6GAATGCCAGG~CAAAATAAATGGGACTGG6TG~TAGGGA CCCAGTACAGTGGAACGCCAGGGCCAAAAA GCTGGAGTGGGTGOOTAG6TG CTCAGTACAGAGGAATGCCAGGGCCAGAAA GCAGGAGTGGGTGGGTTGGGG CCCAGTACAGGGGAACGCCAGGGCCAAAAA ----
b 270 I CCCkAGAG CT6 CTCCCAGGG*ACCC CTC CCCCGGAG CTC CCCCGGAG CTC CT
GTGGGAGTGGGTGGGTAGGGG ---
%
450 AGCAGiGCAGGG6GAbGGTATA666kACTTTCAGGi G GG6 AG 66GTAOsAACCTGAG6AT GGG6A AGTTGGG ffiGC666TAT66666ACTTTT6GGA AGTCGGG GGGAGMiTAT66666ACTTTTGG6A A TT66G A GGGTG66TAT60666ACTTTTGGGA AGCAGGGCAG GGGAAGGTATAGGOOGCTTTGGGGA AGTNGGG -
TAGCATTTGiAATGTAAAT’ AAAGTAiCTAATAAAAiA TAACATTTGAAATATAAATATGTAAAATATCTAATGAAAAA TATCATTCGAAATGTAATTGAGGAAAATACGTAATAAAAAA TA6CATTGGAAATGTAATT6AGGAAAATATGTAATAAAAAA TAGCGTTGGAAATGTAAACGAGGAAAATACCTAATAAAAAA TAGCATTTGAAATGTAAATGAAGAAAATATCTAATAAAAAA
Rl R2 R3 R4 RS R6
TAGCATTTGAAATGTAAATGAGGAAAATATCTAATAAAAAA -----
R COnSenSUS
AG GGGAGGGTATGGGGGACTTTTGGGA ---
FIG. 5. Sequences of the repetitive elements RI to 116 and derivation of a consensus sequence. The sequent-es are arranged to maximize homology. A uniform counting is used. For RZ and R‘6. the complementary strands are shown. Broken lines in the R3 and R5 sequences indicate sequences that show no apparent homology to the other sequences and which are reported in Fig. 6. The It6 sequence is incomplete, since sequencing started at the BarnHI site (position 68). The asterisks at positions 266 and 414 of the R2 sequence indicate non-homologous stretches of DNA, which in Fig. 3 appear in positions 707 to 760 and 901 to 934. Nucleotides in the consensus sequences are named if they occur in this position in the majority of the sequences Rl to R6; if there is no clear majority, the nucleotide is named N. Nucleotides that are identical in all sequences (except. R2) are underlined once in the consensus sequence. Invariant restriction sites are underlined twice. A vertical line at. position 262 separates part,s a and h of the repetitive sequences (see Discussion). The hyphens have been omitted for clarity.
462
IV.
GEBHAKD
ET
dL.
R3 flanks GCGCCGGTGATGCCGGCCACGATGCGTccGGCGTAGAGGAATcAGATc --
0
*CCnCnGnc~GnTiT~T*~*TA*AT*~TAGA*AGTC~ACATTC*TGiA*A*GCTG*~C*ACACTCT~TTC*ATGA~~ @ nCTTGGTCAiGGGAAAnnniA*GG**G*A~TTAAAGACTiTTTAGAGTTiA*TGAAAGAAGccAcAA~*TAcccATG~TTTTGGG*~~ G**AGCAAGCnTTCCTA*G~GGAAAACTT~T*GcTcTGA~TGc*Gcc*A~*cG*AAcT*~*G*G*GcAT~c*cT*GcAG~cTGGc*Gc~~ L. ACCTAAAAGCTAGAGAACAAAAGGAAGCAiATTTA TCATGGCAGiAATTAACCAAGTGG~A~CTAiAACAAACCCAAGAGCTTGTiCTTTGAGAAAATCAACAA~~ TAGATAAATkTTAGCCAGkTAACTAAAkTGCACAGGGAGAGTATCA
AAAGGAGACATAACAAC::
AACCTAAAGAAATCCAAAACAC-TETZTAAAAGGCTAT
GATGACATGGACAATTT::
TCGACAGGTACCACCTACCAAAGTTAATCAGGATCAGACY GGTCTCCCAiACAAAAAAGGTCTAGGAACATATGGGTTTiGGGAAGAOTiCTATCAAACCTTCAAAGAAGACCTAATTCiAATATTC~~~ AAACTATTCCACAAAATAAiAACAGAAGT~ACTCTATCC~ATTCATTCT~TTTAGCCAT~ATTACTCTA~ACCTAAACC~CACAAAG~~~ CAAAAAAATiAGAATGTAAGACCAATTTC~CT~ATGAAT~TCGATGCAAAAATACACAAiGAAATCCTT~CAAACCGAT~CCAAGAA~~~ ATCAAAATGiTCATCCATCATGATCAAGA~GTATCATCC~AGGGATGCA~GGATTGTTT~ATATATGGA~ATCCATCAA~ATAATCC~~ ATATAAACAAACTCAAAGACAAAAACCACATGATCATCT~ATTAGATGCiGACAGAGCAiTTGAAAAAAiCCAACACACATTCATGA~~~ AAGTCTTGGiAAGATCAGGnATTCTTGAAGACGAAAGGGC TTAGACGTCiGGTGGCACTiTTCGGGGAAiTGTG:::
R4 flanks
cs
CCACCTTCTiGAGTTCTTTATATATATTGGATATTAGACCCCTATCTGAiTTAGGAGAG~TAAAGATCCiTTCCCAATC~GTT 8 -Q --ACACCCATGGAAGGAGTTACAGAGACAAAGTTTGGAGCTGTGACGAAAGGATGGAC~T
Groo
ZGACTG~CG&ATCCAG
m-
--.--.--.---
-
TGAC~AA~~TGGAC~CAf
- b
R4 . . . . . . . . ..~TATTTAAAAAATTATCCAiAGAAAACTGiTAATTAACA~TGTTAATCAT
TATTAATGATGTCAATATGiATATAGCTGGTTGAAATGTi~ CTATTAAAAiTTGTTAGTGnAATATGATTAATAATAGAAACCTCTTCAACAAATTATAAiAATGTTAACiAGCCATACCACTTAGTC~~9TOOA GAATATTTCTTCACAATAAACATTTTGTATTGATGAATAAAACTGTTGAAAGCTAAAAAAAAAAAAAA~ Q TTTCCCAATCTGTTGGTGTCCTTTTTGTCTTATTGACAGTGTCTTTTTGTCTTACAGAA~CTTTGTAATiTTATGAG~~~
R5 flanks
GAATGTAGAoTAAGAAATGcTGATGGAAGCAnTACAATCi .@I-* @ TTATATTGTACTCACAGTTCAATGAAATATGTAAAGAAGCATGCACAGG~GTCTGTCAA~TACAAACACTA 388 BOO . . . . . . . . . . . FATAGAAAAAAAA GGAGCCAGAiTATAGGCAGiGAAGGGCAAiGTGGCCTG............R6 e_e AAAACAAAAACAAAAACAGAAACACAATAl+GAAGCCAGCkTAT?
R6 flank . . . . . . . . . . . . ;R6..
. . . .-. . .‘...
.;l;GAATGAATGTTGAATCTTATCAAAGACTTTTTCTTTCTTTTTTTTTCCTTCCATiGAGGTGA~~~
TATGATTCTiGTATTTTTTiGTCCATGTAiTTGGTTTAT~GCATTTATT~CTTATAAAT~TTAAAAATA~AAAATAAAA~CAGTGGA~~~ .I AAAAAAATTiTACATTAAAiATATTGAAAGTAAAAGATTTTCTAGAAGC~A~~ TACCCCCACcGATTAATGCcAAGCCTAGA~ACTTTTGCTAAAACTTGGG~TCACT
about
250bp
TCCTTTGTGiAATCCATG::
CCT&GGCATGGAGbTGTATGCCTiTATTCCCA:: --~&
ACTTGGGAGkAGAAGAAGkAGAT;? ----a -
FIG. fi. Flanking sequences of the repetitive elements K3 to K6. The numbering of the sequences is interrupted by t,he repetitive elements themselves (Fig. 5), which are indicated by dotted arrows. Position 698 in the R3 flank and positions 244 to 248 in the R5 flank ape not, covered by sequencing (Fig. 4) but. are deduced from restrict.ion cleavage. For the R6 fragment.. the complementary strand is shown; since it contains a st.retch of unsequenced D?U’A (about 250 bp, see Fig. 4(d)), the numbering of the last 137 hp starts again from position 1. The T in position 137 is deducrd as part of the original &/I 1
MOUSE
INTERSPERSED
REPETITIVE
known sequence starts at the BumHI site at repetitive element of 414 bp, a downstream downstream, a sequence of 137 bp. Note that the are written are given by the thick dotted arrows
DNA
463
position 68 (Fig. 5), defining a flank of 209 bp and, further directions in which the sequences in Figure 4.
(d) Copy number of R sequences in the mouse genome Saturation hybridization experiments were carried out in order to estimate the abundance of R sequences in the mouse genome. A number of R sequencecontaining clones were immobilized on filters and the percentages of radioactive total DNA retained by the filters determined. The results are given in Table 1. R3 and R4-containing clones were selected because these two R sequences show a low divergence from the consensus sequence. R3 is an example of a truncated sequence and R4 is an example of a full R sequence. Fragment L4 contains, in addition t,o the R sequence, at least one Bl sequence (Fig. 1). For comparison, cloned mouse satellite DNA was included in the experiments. The values given in Table 1 are plateau values with respect to saturation condit’ions during hybridization (see the legend to Table 1) and with respect to the time of treatment with S, nuclease (usually 30 min). These variables had been checked in separate experiments.
Determination
of relative amounts of repetitive DA’-4 sequences in the mouse genomr Percent~ages
s(n)
Satelite
5.9
IA K4 R3
2.7 1.7 W8
+04 (7) fO.4 (8) kO.3 (10) kO.1 (7)
DS.4
clones
Saturation hybridization experiments were carried out in a manner similar to that described by Kramerov rt RI. (1979). A satellite DNA clone (see Materials and Methods), fragment L4 in pBR322. and the R3 and R4-containing subclones (Fig. 4) were used. A portion (IOpg) of each of the DNAs was immobilized on nitrocellulose filters and hybridized with sonicated (average 1 kb). nick-translated mouse DNA (20 ng each, 2.5 x IO’ t,o 6.0 x 10’ rts/min per pg; &renkov counting) overnight at 68°C in a volume of 0.5 ml : salmon sperm DNA was included as carrier. The filters were given a final wash with SW (0.15 M-NaCI, 0.015 M-sodium citrate, pH 7) containing 05% (w/v) sodium dodeeyl sulphate. Three sets of experiments with several filters each were carried out using different nick-translated probes. Treatment of the filters with S, nuclease (98 unit) was in 08 ml each of 02 M-NaCI. 0.1 mM-ZnSC),. 25 mwpotassium acetate (pH 4.5). 92 mg bovine serum albumin/ml. 5Opg salmon sperm DNA/ml at room temperature. The percentage of radioactivity remaining on the filter relative to the total radioactivity of the probe was calculated and the background of a pBR322-containing filter (average 004~c) was neglected. The average of ??filters is given with the standard deviation s. The values in the Table are saturation values; essentially identical results were obtained with 1 and 5 pg of the DNAs per filter. sit,e. Direct repeats are marked by numbered superscript arrows, which are interrupted at nonhomologous positions. DNA stretches that are of special interest because of homology to other DNA or RNA sequences (see the text) are underlined and designated a to c: in the R3 flank, sequence a (positions 791 to 810) is homologous to positions 23 to 43 of rat U3 RNA (a complementary strand is used) ; in the 124 flank, sequence b (positions 85 to 192) is homologous to the subrepeat of mouse satellite DNA; positions 77 t,o 137 in the additional fragment of the R6 flank, which are marked c, are homologous to positions 1 to 61 of the Bl consensus sequence. The hyphens have been omitted for clarity.
464
Lc'. (:EHHARD
ET .‘I I.
On the basis of the chain lengths of R3 and R4 (216 bp and 473 bp) and of thca size of the haploid genome of the mouse (2.7 x 106 kb), the hybridization data given in Table 1 yield a copy number for the R sequences of 100,000. The value for L-4 is higher than that for the R4 fragment because of the Kl content, of L4. and becsauscb L4 possibly contains a stret,ch of R-like sequence in addition to R4. The satellite DNA-like sequence found near R4 (see Discussion) does not t:ontribut,e to thcl hybridization values given in Table 1, since no fragment containing this sequerrct~ hybridizes with total mouse DNA in blots nor do t)hc L-C or K4-containing clones hybridize with satellite DNA on filters. The value of 59o/, found for the satellite DNA ctmttbntj of mouse I)NA is in the value is OJI the low side of this known range of .5 to 1O”;o. Because our hybridization range, one may argue that the copy number of bhe R srquences could be higher than 100,000. Also. the fact that some R sequences (such as R2) are t)oo far diverged from the consensus sequence (see Discussion) to hybridize under our conditions points to a higher copy number. However, on the basis of our experimental data. we propose for the time being to use the value of 100,000. It is noteworthy that. within t,hr experimental error, the hy bridizat,ion va,lue for t)he truncated R3 sequence leads to the same copy number as the value for R-C. xvhich is a eomplet,e R sequence. This mag mean t,hat truncation is rare or that it affects t)he 5’ and the 3’ parts of the repetitive element t’o a similar esttxnt. The copy number of 100,000 for t,he R sequences means that they occur also. irr all likelihood, outside the immunoglobulin gene regions. It is reasonable to assume that they are distribut,ed throughout. most, of the mouse yenornc*.
EcoRI-digested and undigest,ed I)SA from human plac~~nta. =\frioan green monkey cells. rat and mouse liver. calf thymus. chicken blood, and .yeast we’re electrophoresed and blotted. After hybridization n-it)h the nick-translated Rl/R2containing subclone, signals were seen only with the ra,t and mouse liver DNAs. With the EcoRI-digested rat D?\‘A. thv whole lane hybridized to the probe. although at a much lower intensity than with t)he caorrrsponding mouse DSA digest. It cannot be decided at present whether this is due to a smaller copy number of R-like sequences in the rat genome or to sequence diff&cnccs between such elements of the rat and the mouse.
4. Discussion parts of th,e R seqtcPtt,ws The six repet,itive elements of the R family seyuenced in this work are compiled in Figure 5 and a consensus sequence is derived. A number of regions are full\ conserved in the sequences Rl and R3 to R6; for example. 18 bp sequences at positions 53, 127 and 235, an 11 bp sequence (position 146) and 10 bp sequenees (positions 11. 282 and 482). The regions between the blocks of conserved sequent*cs differ from sequence to sequence. These differences are expressed as divergence (in 26) from the consensus sequence in Table 2.
MOUSE
INTERSPERSED TABLE
REPETITIVE
465
DNA
2
Divergence of repetitive sequences of the R family M’hole sequence Rl K2 K3 R4 It5 RB
87 28.5 6.9 82 8.7 14.0
Part a
Part b
5.5 12.1 90 11-8 11%
12.4 36 6.9 69 0.0 16.0
IXvergenre is defined as the percentage of nucleotides different between the particular sequence and the consensus sequence (Fig. 5). In the cases of insertions and deletions, the numbers of base-pairs involved are counted, except for the 2 big insertions in R2 (marked by asterisks in Fig. 5). which are counted as 1 bp each (if all base-pairs were counted, a divergence of 76”, would be obtained). Parts a and b of the repetitive sequences range from positions 1 to Z&Z and 263 to 475. respectively, but. R5a and litia stzart at positions 71 and 68. respectively (Fig. 5).
For reasons of convenience, we define two parts of the sequence that we call a and 1). The border was set at the SstI site at position 263, which is also the region where in the R2 sequence 34 additional nucleot.ides are inserted and where in the R3containing fragment the repetitive DNA sequence starts (R3 consists only of the part b sequence). Parts a and b of the R sequences differ with respect to divergence from the consensus sequence (Table 2). The divergence of the parts ranges from 5.5 to 36%, with averages of 12.0% and 14.0% for parts a and b, respectively. These values are comparable to values for the human Alu family, where 8.3 to 22.6% was reported as the range and 12+3% as the average divergence (Deininger et al., 1981). In the mouse Bl family, where only three sequences have been reported, the average divergence is 4% and no two sequences differ by more than 8% (Krayev et nl., 1980). A recent,ly sequenced member of the Bl family located near a mouse pglobin gene (Goggins et al., 1982 shows 11% divergence from the consensus sequence given by Krayev et al. (1981). Most of the divergence is due to single base changes, which are predominantly transitions : 38% transversions were found in Rl to R6 compared to the consensus sequence. The corresponding figure for the human Alu family is 28% (Deininger et nl., 1981). There are about as many purine-pyrimidine as pyrimidine-purine transversions, which means that the purineepyrimidine ratio of 1.41 is largely conserved. Also, the lengths of the repetitive DNA sequences are similar, since the number and lengths of insertions, excluding the two big ones in R2b, are approximately balanced by deletions. The 22.1% divergence of R2a explains also why the corresponding 357 bp SstI fragment did not hybridize with unfractionated mouse DNA (Fig. 2(a)VI). There are no hybridization experiments with R2b but, considering its even higher divergence of 36%, it is unlikely that it will hybridize under our conditions. However, R2b can be identified as part of an R sequence; for instance, on the basis of 10 and 12 bp homologies between R2b and Rlb (positions 665 and 435, 862 and 285 in Fig. 3). There is no difficulty in finding homologies between R2a and Rla.
\V. GEBHARD
466
67’
AL
The family of R sequences was initially defined on the basis of blot hybridization experiments. A definition on the basis of sequences, which is now possible, includes also R2 and possibly other further diverged sequences, which do not hybridize with other R sequences. The finding of more sequences of the latter type would. of course, influence considerably the average divergence value. The division of the R sequences into two parts does not imply that’ significant homologies between them were detected. Our computer search programs did not reveal a regular sub-repeat within the R sequences. but, there are clear regularities in the sequences. Even superficial inspection of t,he sequences given in Figure .5 reveals blocks of certain nucleotides and nucleotide combination. It is unclear whether and how such a clustering can be related to the rAvolution or maint.cnanc:e of the R family of repetitive DNA sequences, nor is it known whether parts a and I) of the repetitive DNA sequences arose separately and were then linked at, somr stage of evolut,ion.
(b) Comparison
of R sequences
und their
fluAs
with
other
srqwrrws
No sequence homologies were detected with our computer search programs between the R sequences and other dispersed repetitive DNA sequences ; for example, the repetitive sequence near t,he gene for a mouse histocompatibility antigen (Steinmetz et al., 1981), the Alu sequences of man (Deininger rt a,l., 1981). Chinese hamster ovary cells (Haynes et nl., 1981) and the Bl sequence of mouse (Krayev et a!., 1980). This is in keeping with the finding of no cross-hybridization between our Rl/R2 subclone and the Mm31 clone of bhc Rl family. It may 1~ that Bl sequences were found by hybridization in thtb noted, however. neighbourhood of several R sequences (Fig. 1). Sequence analysis showed rnoreovet that the 3’ flank of R6 extends into a Bl sequence (Fig. 6). The sequenced part of it has a divergence of 15% when compared with the consensus sequence of Bl (Krayev et al., 1980). Several short regions of homology between R sequences and sequences of small nuclear RNAs were detected. Three examples are given. As indicated in Figure 3. there is a 19 out of 23 bp homology between the Rl sequence and the 4.5 S RNA of mouse (Harada & Kato, 1980). This includes a region of the 4.5 S RNA (positions 28 to 37) that is the only one to show no homology to the Alu sequence of Chinese hamster ovary cells (Haynes et al., 1981) or the mouse Bl sequence (Krayer rf 4.. 1981). There are two complete homologies between 12 and 11 bp &etches of KT, (positions 71 to 82 and 252 to 262 in Fig. 5) and mouse 57 S RNA (positions 122 t,o 133 and 76 to 86; Kato & Harada, 1981). These homologous sequences occur exactly at the beginning and the end of part a of the It5 sequence. An 18 out of 21 bp homology should also be mentioned: which exists between a stretch in t,he 3’ flank of R3 (positions 791 to 810; Fig. 6) and the rat U3 sequence (positions 23 to 43: Reddy et al., 1979). Sequence similarities over short stretches that are not statistically significant arca recorded here only for two possibly interesting cases. The T-(&G-(: and A-(:-(‘-T sequences that are abundant in switch sites of heavy chain immunoglobulin gt9trc.s (Kataoka et al., 1980) also occur frequently in the R seyuences: particularly in thr
MOUSE
INTERSPERSED
REPETITIVE
DNA
467
blocks of nucleotide combinations mentioned in the preceding chapter. Similarities exist also between parts of the Chi site of bacteriophage lambda (Smith et al., 1981) and, for instance, sequences around positions 700,760,900 and 940 of R2, which are located at the borders of large insertions in this sequence. Another interesting sequence similarity, albeit of debatable statistical significance, concerns the stretch between positions 100 and 190 of the 5’ flank of R4 (Fig. 6) and the subrepeat of the mouse satellite DNA (Fig. 3(b) of Hiirz & Altenburger, 1981). Homologies of 63 to 71% can be obtained when this part of the R4 flanking sequence is aligned with three satellite DNA subrepeats. The occurrence of mouse satellite DNA-like sequences in non-satellite DNA fractions has been observed (Haigwood et al., 1981; Stambrook, 1981). Like the Alu and Bl sequences, the R sequences have very A + T-rich 3’ flanks (Fig. 6). Wherever sequenced, the 5’ flanks of the R sequences are rather A + T-rich. The A +T content of the R sequences themselves is lower than the average A + T content of the mouse genome. The very pronounced transition in A +T content from the R sequences themselves to their flanks is illustrated by Figure 7. In the stretch between Rl and R2, A + T contents of more than 90% are reached. It is noteworthy that there is a pronounced A + T profile with relatively high A + T contents near the transitions of parts a and b within the R sequences. (c) Direct repeats Jlanlcing the R sequences may be related to transposition
events
There are many direct repeats in the flanks of the R sequences (Fig. 6). Only repeats that either occur on both flanks of the R sequences or are particularly extended are shown in the Figure. The flanking direct repeats are found in close neighhourhood to the R sequences or at some distance. It is noted that in the 5’
60
FIG. 7. A+T content of the Rl/RZ-containing fragment. At every position, the A+T content of a 31 bp sequence centered around this position is plotted. Parts a and b of the repetitive sequences are marked and their average A+T contents are given.
468
LV. C’EHHAKD x
lr”/’ i Al J
flank of the truncated sequence R5 all direct repeats are found at, some distance, while in the 5’ flank of R3, which consists only of part, b. the repeats are found both close by and at some distance. No direct repeats are marked in the 3’ flank of R6. since its 5’ flank has not been sequenced. The direct repeats found in t,he neighbourhood of the R seyuencrs tit, into the\ general picture that has evolved from the study of the human Alu seyuenc~s. the Alu-like sequences of Chinese hamster ovary cells and t’he mouse Kl sequences (Kcll rt al., 1980 ; Haynes et al., 1981) : a direct repeat’ is followed by the repetitirc l)SA sequence itself, an A-rich sequence. and again a direct repeat,. Pairs of direct repeats show homologies to 80 to 1000/O.The direct) repeats in the 5’ flanks a,rcl usually rc’r! while in the 3’ fla,llk. close or immediately adjacent to the repetitivr IIR’A seyuencrs because of the length heterogeneity of the A-rich sequence. the distance is more variable. The location of the direct repeats may be related t,o the dispersion of thck repetitive elements throughout’ the grnome. Also, small nuclear RNA pseudogcncs were found to be flanked by direct repeatIs (van Arsdell et nl., 1981) which. in th(bstl cases, may serve a similar function as in the repetitive element~s. An attra,cti\-c mechanistic possibility was suggested by these authors and by ,Jagadeeswaran PI rrl. (1981): transcripts of the repetitive element,s (and the small nuclear RNA pseudogenes) are retrotranscribed and the complementary DNAs art’ insrrtrbd into the genome with formation of flanking direct repeats. The format,ion of the R4-containing sequr’ncr ma,y be explained by t\\ o consecutive transposition events (Fig. 8). (1) B sat,ellit,e DNA-like sryut~ncc~ (positions 85 to 192) and an adjacent A-rich sequence (posit,ions 684 to 696) arts inserted into the original T-rich sequence, which is now represc>nted as nucleotides I to 84 and 1006 to 1070. In this procclss, a 19 bp sequence is duplicat,ed : the original sequence rnight have been located in positions 66 to 84 and the duplicate, iti positions 987 to 1005. but the alternative is rlqually possible. (2) 1n the, sec~t~tl transposition event. R4 including a short A-rich srquen~ is inserted I)t%ucrn thr satellite-like seyuence and the already present’ .&rich srqutancr with formation of a 6 bp direct repeat (positions 193 to 198 and 684 t,o 689).
84 1006
0 66 84 192 684
+
0 986
1006
0
FIG. 8. Hypothetical scheme for the formation of t,he lb-c,ontaining aeyucnc~e. The various sryuenvt elements mentioned in the text are differently shaded and hatched. The arrows 5 and 7 designate the direct repeats of Fig. 6.
MOUSE
INTERSPERSED
REPETITIVE
DNA
4ti9
Other examples for sequences that may have resulted from transposition are the two stretches of DNA within R2 (positions 707 to 760 and 901 to Fig. 3): they are not homologous to the R sequences and are flanked by repeats. The transposition events in the R family can be discussed either DNA level only or on the complementary RPJAPcomplementary DNA although it is not known whether the R sequences are transcribed.
(d) R sequences may be involved
in recombkation
events 934 in direct on the level.
events
The occurrences of repetitive DNA sequences near genes raises the possibility of functional relationships, All R sequences described in this paper are located on fragments that hybridize with immunoglobulin gene-containing fragments. Rl and R2 occur as closely spaced inverted repeats on fragments f-T and L8 (Fig. 1). The sequenced elements R3 and R4, as well as the unsequenced repetitive elements on fragments L7 and L9, are located near variable gene segments. R5 and R6 are arranged as inverted repeats flanking the variable, joining and constant segments of a kappa light chain gene. Recause of their high abundance and their probable distribution throughout the genome, it is unlikely that the R-sequences play a specific role in the function or rearrangement of immunoglobulin gene segments. But they may very well also function in the immunoglobulin gene system in all sorts of recombination and transposition processes, as was discussed for the repetitive sequences in other organisms. Dispersed repetitive elements that are arranged as direct or inverted repeats may serve to mobilize DNA sequences located between them. They may function also as homology regions in unequal crossover or gene conversion (for a review, see Baltimore, 1981). Such processes may be involved in maintaining the relative sequence homogeneity of the repetitive element’s themselves, and may also be discussed as a means to conserve neighbouring sequences (Miesfeld et al., 1981). Further work with the dispersed repetitive DNA will have to show to what extent they actually participate in the above-mentioned processes. This will then help in the ongoing discussion of the question of whether the processes serve mainly or only the repetitive sequences themselves, as proposed in the selfish DNA hypothesis (Doolit,tlr &, Sapienza, 1980; Orgel & Crick, 1980). or whether they serve the organism We thank H. Steger for expert assistance and G. l? Georgiev and colleagues for donating the Mm31 clone. This work was supported by Bundesministerium fiir Forschung und Technologie and Fonds der Chemischen Industrie. REFERENCES Altenburger, W., Steinmetz, M. & Zacheu, H. G. (1980). X&w-e (London), 287.603-607. Amheim, N., Seperack, P., Banerji, J., Lang, R. B., Miesfeld, R. & Marcu, K. B. (1980). Cell. 22, 179-185. Baltimore, D. (1981). Cell, 24, 592-594. Baralle, F. E., Shoulders, C. C., Goodbourn, S., Jeffreys. A. & Proudfoot. N. J. (1980). Sucl. 3cids Res. 8. 4393-4404.
\I’. GEBHAKI)
470
E7’ d,!,
Bell, G. I., Pictet, R. & Rutter, W. J. (1980). ivucl. Acids &es. 8, 409-4109. Biezunski, N. (1981). Chromosoma, 84, 111-129. Brown, S. D. M. & Dover, G. (1981). J. Mol. Biol. 150, 441-466. Cheng, S.-M. & Schildkraut, C. L. (1980). Nucl. Acids Res. 8, 4075-4090. Coggins, L. W., Vass, J. K., Stinson. M. A.. Lanyon, W. G. & Paul. J. (1982). Ue~zr. 17. 113116. Deininger, I’. L. h Schmid, C. W. (1976). J. Mol. Biol. 100, 7733790. Deininger, P. L., Jolly, D. J., Rubin, C. M., Friedmann. T. & Schmid, C. W. (1981). J. ~t1ol. Biol. 151, 17-33. Dhruva, B. R., Shenk, T. & Subramanian, K. N. (1980). Proc. Sat. dcud. Sci., Ii.S’.=l_ 77, 4514-4518. Doolittle, W. F. & Sapienza, C. (1980). Xuture (London), 284, 601-603. Efstratiadis, A., Posakony, J. W., Maniatis, T., Lawn, R. M., O’Connell, C., Spritz, R. A., De Riel, J. K., Forget, B. G., Weissman, S. M., Slightom, I. L., Bleehl, A. E., Smithies, O., Baralle, F. E., Shoulders, C. C. & Proudfoot, N. J. (1980). Cell, 21, 653-668. Grimaldi, G., Queen, C. & Singer, M. F. (1981). Xucl. Acids Res. 9, 5553-5568. Gross-Bellard, M., Oudet, I’. & Chambon, P. (1973). Eur. J. Biochem. %3, 32-38. Haigwood, N. L., Jahn, C. L., Hutchison C. A. III t Edgell, M. H. (1981). Nucl. Acids Res. 9, 1133-l 150. Harada, F. & Kato, N. (1980). AJucl. Acids Res. 8, 1273-1285. Haynes, S. R., Toomey, T. P., Leinwand, L. & Jelinek, W. R. (1981). Mol. Cell. Biol. 1, ,573.. 583. Heller, R. & Arnheim, N. (1980). ~Vucl. Acids Res. 8, 5031-5042. Hochtl, J.. Miiller. C. & Zachau, H. G. (1982). Proc. Sat. .4cad. I%., C’.S.d. 79, 13831387. Hiirz, W. & Altenburger, W. (1981). Nucl. Acids Res. 9, 683-696. Houck, C. M. Rinehart, F. P. & Schmid, C. W. (1979). J. Mol. Biol. 132, 289-306. Jagadeeswaran, P., Forget, B. G. & Weissman, S. M. (1981). Ce2l, 26, 141-142. Kataoka, T., Kawakami, T., Takahashi, N. 6 Honjo, T. (1980). Proc. Xat. Acad. hi., I’.S..4. 77,919-923. Kato, N. & Harada, F. (1981). Biochem. Biophys. Res. Commun. 99, 147771485. Kominami, R., Urano, Y., Mishima, Y. & Muramatsu, M. (1981). Nucl. Acids Res. 9, 3219-3233. Kramerov, D. A., Urigoryan, A. A., Ryskov, A. P. & Georgiev. G. 1’. (1979). ~Vucl. Acids Res. 6, 697-713. Krayev, A. S., Kramerov, D. A., Skryabin, K. G., Ryskov, A. I-‘., Bayev, A. A. & (Georgiev, G. P. (1980). AVucl. Acids Res. 8, 1201-1215. Manuelidis, L. (1980). N,ucl. Acids Res. 8, 3247-3258. Maxam, A. M. & Gilbert, W. (1980). Methods Enzymol. 65, 499-560. Miesfeld, R.. Krystal, M. & Amheim, N. (1981). Nucl. Acids Res. 9, 5931-5947. Neumaier, P. S. (1981). Ph.1). thesis, Medical Faculty. University of Munich. Orgel, L. E. & Crick, F. H. C. (1980). X&we (London), 2&4,604-607. Pech, M., Hochtl, J., Schnell, H. & Zachau, H. 6. (1981). Abture (London), 291, 668-670. Reddy, R., Henning, 1). & Busch, H. (1979). J. Biol. Chem. 254, 11097-11105. Rubin, C. M., Houck, C. M., Deininger, P. L., Friedmann, T. h Schmid, C. W. (1980). Xuturr (London),
294, 372-374.
Singer, M. F. (1982). Int. Rev. Cytol. In the press. Smith, G. R., Kunes, S. M., Schultz, D. W., Taylor, A. h Triman. K. L. (1981). Cell. 24,429 436. Stambrook, P. J. (1981). Biochemistry, 20, 4393-4398. Steinmetz, M. & Zachau, H. G. (1980). Nucl. Acids Res. 8, 1693-1707. Steinmetz, M., Altenburger, W. & Zachau, H. G. (198Ou). *Vucl. Acids Res. 8, 17091720. Steinmetz, M., Hiichtl, J., Schnell, H., Gebhard, W & Zachau, H. G. (198Ob). Vucl. .4cids Res. 8, 1721-1729. Steinmetz, M., Frehnger, J. G., Fisher, D., Hunkapiller, T.. Pereira, D., Weissman, S. ?;I., Uehara, H., Nathenson, S. & Hood, I,. (1981). C’etl. 24, 125134.
MOUSE
INTERSPERSED
REPETITIVE
DNA
471
van Arsdell, S. W., Denison, R. A., Bernstein, L. B., Weiner, A. M., Manser, T. & Gesteland, R. F. (1981). Cell, 26, 11-17. Zachau, H. G., Hiichtl, J., Neumaier, P. S., Pech, M. & Schnell, H. (1982). In Genme Evolution and Phenotypic Variation (Dover, G. & Flavell, R. A.. eds), pp. 193-204. Academic Press. New York.
Edited
by P. Chambon
,Votr added in proof: The orientation of the Rl/RB-containing fragments relative to the rest of fragments f-T and L8 was tentative (see the legend to Fig. 1). Additional sequence dat,a now show that it is opposite to the orientation in Fig. 1. From the same sequence data it is apparent that the homology between Rl and R2 (but not the homology to R4) extends a further 30 bp.