GENOMICS
13,983-990
(1992)
Dissecting (CAC),/(GTG), Multilocus Fingerprints from Man into Individual Locus-Specific, Hypervariable Components HANS ZISCHLER, CLAUDIA KAMMERBAUER, ROLAND STUDER, K.-H. GRZESCHIK, * AND J~RG T. EPPLEN’ Max-Planck-lnstitut
fijr Psychiatric,
W-8033 Martinsried, Received
Germany, and *Institut
July 26, 1991;
revised
January
fijr Humangenetik,
W-3550 Marburg,
Germany
16, 1992
different classes, e.g., the “minisatellites” (Jeffreys et 1985) and simple tandem sequences (Epplen, 1988). The latter class consists of short motifs, 1 to approximately 10 bases in length (Skinner, 1977). Apparently, this type of repetitive DNA occurs as enormous length variations, ranging from short sequences, recently named microsatellites (Weber, 1990), to very long stretches of several kilobases. The detection of such elements using oligonucleotide probes harboring different simple motifs has allowed highly informative DNA fingerprints to be established in each species tested so far (Epplen et al., 1991). The synthetic probe (CAC), and its direct complement (GTG), generate individual specific DNA fingerprints in man (Schtifer et al., 1988). This probe has been studied extensively with respect to its somatic as well as germ line stability (Niirnberg et al., 1989). The distribution of the target sequences in the human genome was determined by nonradioactive in situ hybridization, revealing a coincidence of (cat),/ (gtg), signals with R-bands at a resolution of 200 bands per metaphase (Zischler et al., 1989). To breach the multilocus arrangements, we analyzed individual (cat),/ (gtg),-bearing DNA loci from the 3- to lo-kb region of a human DNA fingerprint.
Individual components of multilocus fingerprints from man produced by (CAC),/(GTG), oligonucleotides have been scrutinized to characterize their peculiar properties. Successful cloning and changes occurring during the propagation of recombinant simple repetitive DNA in prokaryotic hosts are described. The isolated locus-specific probes were characterized with respect to their formal (and population genetic) properties and their usefulness for individualization and linkage studies. The localization was determined on chromosomes 8, 9, 11, and 22. Repeat flanking sequences were characterized and analyzed for their coding potential because of significant open reading frames and apparent evolutionary conservation among vertebrates. The organization of the repeats and their flanking regions in the human genome is discussed with respect to the sequence (fine) architecture that developed during evolution. Classical “minisatellite” sequences were not detected near hypervariable (cac)J(gtg), repeats. The singlecopy probes described herein are a convenient complement to the oligonucleotides employed for multilocus fingerprinting. Many practical applications are apparent. IC 1992 Academic
ai,
Press, Inc.
INTRODUCTION
Eukaryotic genomes contain different families of tandemly organized repetitive DNA. The biological meaning of the bulk of repetitive DNA is still largely a matter of speculation. Mediated by structural features and possibly interacting proteins, some elements of tandem repetitive DNA may play a role with respect to gene expression, recombination, and DNA replication or may be involved in chromosome organization (Stallings et al., 1991). Because these sequences are in general polymorphic, they have had a remarkable impact on genetic, forensic, and population biological studies. Based on the lengths of the repeat motifs and the chromosomal distribution, tandemly repetitive DNA can be subdivided into
MATERIALS
AND
METHODS
DNA isolation, restriction enzyme digestion, electrophoresis, Southern blotting or gel drying, and probing with either cloned probes or oligonucleotides were carried out according to standard protocols (Sambrook et al.. 1989; Schgfer et al., 1988a,b). Unless otherwise indicated, final washing stringencies after hybridization of cloned probes were 0.1X SSC at 68°C. The ohgonucleotide probes HZ1103 and HZ4103 were labeled at the 5’ end via a kinase reaction. Labeled oligonucleotides were separated on a denaturing polyacrylamide gel, and lo6 cpm/l ml of hybridization solution were used. In-gel hybridizations of oligonucleotide probes were carried out for 3 h at T, -5°C (HZllo3, 55°C; and HZ4103, 53’C) in the presence of 5~ SSPE, 5~ Denhardt’s, 0.1% SDS, and 10 pg/ml fragmentedanddenaturedEscherichia coli DNA. Washing was done with three changes of 6~ SSC at room temperature followed by a 1-min stringent wash at the respective hybridization temperature (Thein and Wallace, 1986). Exposure times were 4 and 8 days with intensifying screens for HZ4103 and HZ1 103, respectively. Plasmid DNA was obtained by the alkaline lysis method (Sambrook et al., 1989) followed by ion-exchange chromatography (Quiagen). Double-strand sequencing was performed by the dideoxy chain termination procedure using the USB sequencing kit.
The sequence data have been submitted to the EMBL database under Accession Nos. X59821-X59830. The D-number assignments are HZll, DllS859; HZ32, DBS204; HZ41, D9S128; HZ42, D22S265. 1 To whom correspondence and reprint requests should be addressed at present address: Ruhr-Universittit Bochum, Molecular Human Genetics, MA, W-4630 Bochum, Germany. 983
0888-7543/92 $5.00 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.
984
ZISCHLER
Synthesis, deprotection, purification, and labeling were performed essentially as described in Schafer
of oligonucleotides et al. (1988a).
Library construction and screening. DNA from three unrelated persons was pooled, digested to completion with MboI, and run on a preparative agarose gel. The gel slice encompassing the 3. to lo-kb region was isolated and DNA was electroeluted. After ion-exchange purification (Quiagen column) the DNA was ligated into the lambda Zap II vector (Stratagene). Because this vector lacks a unique BamHI sit.e, two alternative approaches were chosen: (i) XhoI half-sites were prepared by partially filling in XhoI-generated, 5’-protruding ends with a final concentration of 1 mM dTTP and 1 mM dCTP using 2 U Klenow polymerase (New England Biolabs) at 37°C for 30 min in 50 mM Tris/HCl (pH 7.2), 10 mM MgSO,, and 1 mM DTT. The linearized vector was dephosphorylated to prevent self-ligation of vector bearing unmodified XhoI sites. Before ligation, the MboI-cut genomic DNA was also partially filled in with dGTP and dATP to produce sticky ends fitting the XhoI half-sites. After overnight ligation at 16”C, the DNA was packaged in uitro (Gigapack Gold, Stratagene) and plated on XL-1 blue cells (recA1, lac-, endAl, gyr96, thi, hsdRl7, supE44, relA1). (ii) The alternative approach included EcoRI-BamHI conversion adaptors basically following the methodology described in Stover et el. (1987). Two oligonucleotides were synthesized: (A) GAATTCGAACCCCTTCG and (B) 5’GATCCGAAGGGGTTCG. After kinasing A, both oligonucleotides were annealed and ligated to EcoRI-cut lambda Zap II DNA in a thousand-fold molar excess (16”C/ 3 h). Before the genomic DNA was ligated to the vector, the nonligated adaptor was removed by gel filtration (Sephadex S200, Pharmacia). After in vitro packaging, phages were plated on XL-1 blue cells. Both libraries were screened with 32P-labeled (CAC), . From 205,000 recombinant clones, 6 (CAC),/(GTG), strongly positive ones were detected. The conversion of the (CAC),/(GTG), positive lambda Zap II clones to pBluescript phagemids was accomplished according to the in uiuo excision protocol supplied by the manufacturer. Emnuclease III/mung bean nuclease treatment. Unidirectional deletions in the plasmid inserts were generated by exonuclease III/mung bean nuclease treatment according to the supplier’s protocol (Stratagene). Before self-ligation, the reaction mixtures from each incubation interval were electrophoresed on a preparative LMP agarose gel and the major bands were excised. Thereafter, DNA was purified by absorption onto glass milk (GeneClean, BIO 101) and religated overnight. Competent JM 109 cells (recA, endA, gyrA96, thi, hsdRl7, supE44, relA1) were transformed according to Perbal(l988). Several colonies from each time interval were isolated and their plasmid DNA was prepared. Polymerase chain reaction (PCRI. For chromosomal localization of the different probes, DNA from human-rodent hybrid cell lines was subjected either to conventional hybridization or to PCR amplification. Primers were designed according to the repeat flanking regions: HZllol, 5’ AATTCTGGACAATGAGTAG; HZlloP, 5’ CCAACCTATGTAATCTAAG; HZ4101, 5’ GATCTGAATCTTCAATTTGAC; and HZ4102,5’ TTGTTAAGGTAGTTTGAGGG. Different final concentrations of Mg2+ (4,5, and 6 mM) were tested. Before addition of the Tag polymerase (Cetus-Perkin Elmer) and the respective DNA, reaction mixtures were subjected to a 15-min UV irradiation at 260 nm (Sarkar and Sommer, 1990). Standard PCR reactions of 1 fig DNA were carried out with 30 s of denaturation, 1 min extension at 72”C, and 1 min annealing. Annealing temperatures were 47°C for HZ11 and 56°C for HZ41, respectively. Amplified DNA was subjected to electrophoresis followed by transfer to nylon membranes and challenge with an internal oligonucleotide probe.
RESULTS AND DISCUSSION Isolation and Sequence Analyses of Locus-Specific Hypervariable (cac),/(gtg), Loci From two size-enriched genomic libraries (established with MboI-digested human DNA) six (cac),/(gtg),-containing phage clones were isolated and tested for their
ET
AL. m
(1.75Kl3)
-
H
BAL
I t I-.-------------------------------------------------r r-----.-.------------------------------------r I------,-----------.-----------------I
TH4
m EA
-
PAR
tzkkdk_+_iit_ii_-trt I-t---------------...--..----I------,----------I I--.--,-----I I-----..-. w
1 -----*I x0 I! x0
(1.971(B)
HI
EA
HI
I-------------(--------------I I-----.--,----I ~.-----,I I-----.--------I
&x0
(1.55KB)
-
I .--------.
~-.--------.------.---------~ I-------,*
(0.86~~)
RAL
KY.
w E
IOOBP
L”P; I-.------.-----------i
I.-------.I
loolP
PAR
:9-; I.-.-----.~
m
1oo~P E
t
100BP RAM
-&-vzJ
r---------------------------.-----------r I-..--r.--------.----------r r------*----------------r I------,-------I
x0 E )(o
(2.85KB)
E
IOOBP
HAE
I 1
tiAE
+
1
._;
1
FIG. 1. Restriction maps of the (cac),/(gtg),-containing plasmid subclones of phage HZll, HZ32, HZ41, HZ42, and HZ127. Restriction fragments containing the simple repetitive sequences are marked with + and the vector DNA sequence is marked with V. Black boxes show the repeat flanking subclones. Subclones and sequencing strategies are indicated by interrupted lines and arrows, respectively. Sequences of both strands were obtained for the repeat flanking subclones. A, AU; Bal, Bali; Barn, BarnHI; M, MboI; E, EcoRI; Exo, Exonuclease III; Hi, HinfI; P, PuuII; R, RsaI; D, DraI; Hae, HueIII. The EcoRI sites on the extreme left- and right-hand sides are due to the EcoRI-BamHI conversion adapter.
single locus specificity/informativity. Because one isolated phage could not be recovered during rescreening, only five independent (cac),/(gtg),-containing clones (HZll, HZ32, HZ41, HZ42, and HZ127) were fully characterized. Restriction maps of respective plasmid subclones are displayed in Fig. 1. Restriction enzyme cutting sites are comparatively rare, which accounts for the long DNA fragments obtained after digestion with a number of different (“four base recognition”) endonucleases. Repeat-free flanking sequences were subcloned (black boxes) and designated HZAll, HZA127, HZA32, HZA41, and HZA42. Altogether, sequence information was obtained from 4.8 kb of the flanking and the repetitive parts of HZll, HZ32, HZ41, HZ42, and HZ127 (see Fig. 2). Comparisons of the (cac),/(gtg), flanking subclones to each other or to the sequences in the data banks (EMBL, GenBank) did not reveal homologies. Direct sequence repetitions were excluded in all flanking subclones by dot matrix analyses at 70% stringency. With the exception of HZA42, A/T overabundance of up
HYPERVARIABLE
(cac),/(gtg),
LOCI
HZ11 partial EcoRI-Bali .actataggg cgaaWw3t
fragment ggaggtwtg
fu$y&9.
aatatmaaa
aaataatat taatataa ta~tmma
aatttgctct atttttaaaa
gcagctattt tgaaatctaa
cctcctctct aattcctatg
ctctacaccc
aggctgtgct
gaagaagaat agtctttatt
ggtactggca gtcaccttga
gaggaaacaa gttttactac
agatttcaaa tttttcatat
. ;. gtgst-
wmgatgt
m3tggtgcW
aaaaataatg aaataataaa aatataaaaa
aassaaqtaattcratgggggg atgtggtgqt ggtggtggtg Wxwmw Wt.... at9 tggtgwtat tgiw~~t~~t &wtssssa tatwwma atgtwgag~ akhw3g ggggggggtt atgtggtgat ggggggatgg tggtggctat tctggaagtc aataatcttt aatttaatga
HZ127 HaeIII
ggccgactaa ggagaggctt
aaaaaatgga ttccccattg fragment
ctcctgcccc atacccacaa tgcacacaga attttatgta aaaaaaaaaa aaaacam
accaaaaaaa tfaatttcac aaagctgtga ttaaaccaaa
agctactgac aaaacaaagg atgatagctt
accatgactt
tggaaacatt
ccctagatac aatgaataca actcaaattc
atttcagaaa aacacaactt atggatgtgt
ataattcatt
cttaatatca
tggaaattct cccaaataat
ctgaaaatga tattggtagg
gttctattgt
HZ32 HinfI-AluI fragment agtctcagat ttatttggat tgactggcaa tagtcatttt gagaaaacct ggtgattcaa ccctacctcc cctccatcta ccaccccacc cccacatccc cacctcccca cacctcccac cccacctccc accttccacc accocacctc accttccacc accccacctc ccctccacct
agctttggaa
tgtaggtaac gattagaaat gaatgaaaat gee
aactatgctc
accccacctc ctcctcccca ttggaatctt
agccccagca tcccocacot caccccacct cccgolacott tccaocacoo ctccocacct ccatcctcca cagct
fragment taccatcatt actgctacta ctaccaccag ctatcaccac cttctccacc atcaccacct accac caactcaccatcac cacoaccacc atcaccacca cctocaccat caccaccacc
ccaccaccat cccaccacct tocaccatca tcctccacca
cccctccacc ccatcatcca acctcctacc
ttccaccacc ccaagcccac tcccacctgg
HZ41 RsaI-AhI
HZ42 AluI-Bali
tacaccaaga
attaattcag
tggagaaagt ttctatagaa gcactgccgg
985 60 120 180 240 300 360 gNwtwW3 420 ssGw3~sst ttacccataa 480 540 aacttaaagc 594 ctgg
swam&&
ggggatgtgg aataatataq t-t-am
taacaggcat aaccaccacc tttggattat
gccttgcata gaagaatgct tgcattttta atgttgtgat atggggaaaa
ttgctgtatg
60 120 180 240 300 360 420 480 540 573
caccttccac tccaccaccc agcccacctt
ctccaaccac acctcccccc
caccccacct Gatctccttc gcactatccc
60 120 180 240 300 360 420 480 515
tctcattagt gcaccaoctc cc4cctcctt tcaccaccat
cccatgccca caccataacc caccatcacc tagct
60 120 180 235
ttcaccccag
agtgcatccc
agatagccca tccaccaccc
ccttctaccg qacctccctc
cccccccgac ccaccacccp
fraament
catcctctgg
tgtctttgca
ggctggcca
60 120 180 240 300 360 420 480 540 579
FIG. 2. The tandem repetitive sequence components of the phage clones isolated from a partial genomic library. HZll: Partial sequence of the EcoRI-Bali subfragment (see Fig. 1). Dots indicate borders of the respective subclones obtained after exonuclease III treatment. The different repeat motifs are indicated in bold, bold underlined, bold double-underlined, underlined, and italics. HZ127: DNA sequence of the HaeIII subfragment. The central simple repeat stretch consisting of poly(@ unit is bordered by (act), and (& as well as (gas),. HZ32: Nucleotide sequence of the “longer” Hid-Ah1 subfragment (starting at the second base of the Hinf’I recognition site). Seven perfect repeats of the ccaccttccaccacccc sequence are marked as well as seven acctcccc units. HZ41: DNA sequence of the Rsal-AZuI subfragment (starting with the second base of the RsaI recognition site). The marked 21.mer ccaccacctccaccatcacca is repeated four times. HZ42: Nucleotide sequence of the AU-Bali subfragment (starting with the last base of the AluI recognition site). Different higher order 15mer motifs are indicated in italics, bold double underlined, bold capitals, underlined, double underlined, bold underlined, and italics double underlined.
to 65% was observed in the flanking regions (HZA41, 268 bp). To investigate the basis for differences in allele sizes of (cac),/(gtg), loci, the repeat parts of the inserts were partially sequenced. The corresponding fragments were restriction enzyme digested and subcloned. Parts of the repetitive stretches of HZ41 (Rsd-Ah1 fragment, see Fig. 1) and the repeat in HZ127 were sequenced completely. Because of the lack of restriction sites, unidirec-
tional deletions were generated by exonuclease III/mung bean nuclease treatment in HZll, HZ32, and HZ42. Core sequences of the repetitive parts of the clones were determined using the UWGCG programs REPEAT and FIND. In HZ42 the repeat sequence consists of a reiterating 15-mer, RGGTGGTGGTGATGG, which appears to be deduced from the (tgg), simple motif. One can envision that after having been created by slippage, the 15mer motif was first changed by a point mutation. This
986
ZISCHLER
event was followed by several en bloc duplications to constitute the reiterating 15mer core motif, which was subsequently changed slightly by point mutations. In the 1.15kb-long repeat of HZll, partial sequence information was obtained for about half of the repeat (arrows in Fig. 1). The repeat is not as homogeneous as the former one. With respect to its G-rich strand, the 5’ end is composed of a reiterating GGGGGGATGTGGAGGTGGTG motif (including one mismatch). Half of this core represents the simple sequence (*/rGG). The higher order motif again implicates duplication events after establishment of the longer core. In both HZ42 and HZll, perfect (CAC),/(GTG), targets are present either as parts of the core sequence (HZ42) or in its vicinity (HZll). Both HZ32 and HZ41 exhibit other repeat units (CCACCTTCCACCACCCC and CCACCACCTCCACCATCACCA) reiterated perfectly six and four times, respectively. The 17.mer repeat core in HZ32 is interrupted by smaller islands consisting mostly of ACCTCCCC or derivatives. HZ41 is composed mostly of intermingled with the more ambiguous (cacL/k$g), “/TCA/, motif. In HZ32, CCACC is most abundant, intermingled with TT/cCCC/A. Thus, the pyrimidine content of one strand is more than 80% in a stretch of 350 bp. In both probes, no perfect (CAC),/(GTG), target was observed. In conclusion, a common feature of the repeat parts of all four hypervariable probes is the marked G/C strand asymmetry. Direct repeats themselves, strand asymmetry, and runs of homopurines/pyrimidines (HZll) may cause unusual DNA structures (Wells et al., 1987). Either perfect simple sequences or higher order derivatives thereof constitute the repeats in the sequences described above. Apparently, these higher order motifs originated from short simple sequences by replication slippage (see poly(G) in the higher order motif of HZll). Later, longer and more complex sequence motifs were generated by so far ill-defined turnover mechanisms. The longer higher order motifs share minisatellite-like properties. Extensive homology searches revealed similarity to some GC-rich stretches of Epstein-Barr and herpes viruses (Cullinane et al., 1988; Mocarski and Roizman, 1981). Furthermore, similarities to proteins that are rich in amino acids encoded by (cac)/(gtg) codons or register shifts or permutations thereof were uncovered. Similarities to classical “minisatellite” sequences were observed; they are due to the GC richness and strand asymmetry. All similarities are restricted to the repetitive stretches and do not extend to the unique flanking sequences. Only the short and ambiguous core sequence GNNGTGGG (Nakamura et al., 1988) could be traced in HZll, HZ32, and to a lesser extent in HZ42. HZ127 was the only monomorphic probe that we isolated and analyzed in more detail. The reason for the apparent uniformity of HZ127 may be seen in its repeat structure. Most striking is the short repeat stretch, which is inhomogeneous and consists of four different simple motifs encompassing only about 50 bp. According to Weber (1990), it can be regarded as a composite “mi-
ET
AL.
crosatellite,” which is possibly prone to replication slippage. Here, only one copy of the (CAC),/(GTG), target with mismatches at the very 5’ and 3’ ends was identified. Whereas strong signals possibly due to high titers were obtained on the phage plaques, after the subclones were probed with (CAC), the signal intensity was low. Coding potential analyses and the presence of open reading frames in the (gtg),/(cac),-containing clones did not exclude translation. Therefore, total nuclear RNA was extracted from different human tissues and cultured cells. Flanking locus-specific subclones of HZll, HZ32, HZ41, HZ42, and HZ127 were used as hybridization probes: No transcripts were demonstrable. Nevertheless, either the (CAC), or the (GTG), oligonucleotide cross-hybridized to either the 18 S or the 28 S rRNA band, depending on the vertebrate species investigated (data not shown). Cloning Artifacts Long stretches of tandem or inverted repetitive DNA are not stably propagated in prokaryotic hosts, even when the latter completely lack the ability of homologous recombination due to deficiencies in different Ret gene products (Wyman et al., 1985). The abundant occurrence of insert changes (e.g., collapsing, complex rearrangements) during cloning must be accounted for in the analysis of cloned tandem repetitive DNA and adjacent regions. Vectors such as insertion-type lambda phages allow collapsing and are therefore well suited to cloning tandem repetitive DNA, especially selected fragments after size enrichment. The small insert sizes below 3 kb in HZll, HZ32, HZ41, and HZ42 may thus be due to collapsing during propagation. Cloning artifacts were proven for HZ42 (and HZ32) by sequence analysis. In the case of HZ11 and HZ41, it can be argued that shorter alleles were cloned accidentally (due to retardation of small fragments in the preparative gel electrophoresis). HZ42 phagemids could not be rescued by in vivo excision. After isolation of phage DNA and shot-gun cloning of Mb01 subfragments into pUC19, the sequence analysis of (CAC),/(GTG),-containing plasmids showed that human insert DNA was joined to parts of the vector 1(3lactamase gene (loss of antibiotic resistance). This rearrangement was not restricted to the insert but involved sequences distant to the polylinker. The boundary of insert/p-lactamase sequences is located in a nonrepetitive insert region (Fig. 1). Therefore, simple repeats may cause the rearrangement but they do not determine the final structure. Another artifact was observed in HZ32. Mapping and complete sequencing revealed that the insert consists of two different components connected by a conversion adaptor dimer. Both parts contain identical unique sequences except for one C/T exchange close to the repeat. Yet the two copies differ in the repeat lengths: The smaller one lacks a 168-bp stretch in the repetitive section. Several explanations are possible for this observation. Most likely, two vector-insert molecules were involved in a complex recombination/duplication event.
HYPERVARIABLE HZ11
HZ42
2.3 2.0
, .:.
(cac),/(gtg),
being resolved, or (iii) the second signal band running off the gel. The final conclusion is most likely because of the independently established distribution of the allele lengths, including data from short run gels (Hundrieser et al., 1992). For minimum estimates of heterozygosity, all persons exhibiting one or no band were regarded as homozygous. The preliminary minimum estimates of heterozygosity rates (H) and mean allele frequencies (q) are based on the allele-sharing values (s) and calculated as proposed by Wong et al. (1987) with H = (1 - s)~‘~and q = 1 - H. Due to the limited number of samples and limited electrophoretical resolution, the H values are grossly underestimated: Probe
2.3 2.0
HZ Ll
HZ32
FIG. 3. MooI-digested and electrophoretically separated DNA of 9 unrelated individuals probed with (top left) HZ1103 (oligonucleotide deduced from flanking sequence: ATGTAGGCTTAGATTACATAGG), (bottom right) HZA32, (bottom left) HZ4103 (oligonucleotide deduced from flanking sequence: AGAGATTTAATTTCACTGAGCA) and (top right) HZA42. In the third lane (HZ42), less DNA was loaded. Molecular weight markers are indicated on the right in kb.
In general, inserts that are changed during the initial propagation steps in the prokaryotic host are comparatively stable during all further (sub-) cloning steps (Studer et al., 1991; additional data not shown). The ratio of unique to repeat parts in the inserts is seemingly not very critical, After creating unidirectional deletions in subclones (to the extent of harboring exclusively repetitive DNA), they did not show rearrangements from the original clones from which they were deduced. Heterozygosity
Rates and Allele Size Distributions
For an initial estimation of the heterozygosity rates of the isolated probes, DNA from nine unrelated Caucasians was digested with Mb01 (also used for library construction) and challenged with the repeat flanking subclones. Because the (small) inserts from HZ11 and HZ41 were not well suited to filter hybridizations, oligonucleotide probes were designed according to their repeat adjacent unique regions. Figure 3 shows the hybridization patterns for probes HZllo3, HZA32, HZ4103, and HZA42. At first sight, all these probes clearly display a high degree of polymorphism, with most of the individuals being heterozygous at the respective loci. The samples showing only one allele result from (i) homozygosity at the respective locus, (ii) small lengths differences not
987
LOCI
HZ1101 HZA32 HZ4101 HZA42 HZ127
H (%) 88 88 97 93 Monomorphic
q 0.12 0.12 0.03 0.07
Allele sizes (MboI) (kb)
The respective loci were also examined in three Caucasian families (eight children) and all, HZll, HZ32, HZ41, and HZ42, are inherited according to Mendelian rules. Extended family and mutation rate studies as well as screening data from different ethnic groups have been evaluated (Hundrieser et al., 1992). Chromosomal
Assignment
Identical minimum discordancy values were obtained for HZ42 for localization on chromosomes 4, 10,20, and 22 (Fig. 4). Therefore, a meningioma cell line that lacks one chromosome 22 as revealed by scoring 30 metaphase plates (45Xx, -22) was examined (W. Henn, personal communication). Hybridization of HZ42 to this DNA revealed only one band pointing to the localization on chromosome 22. These results were confirmed by fine mapping data obtained from challenging hybrid cell lines containing different chromosome 22 fragments, allowing the mapping of HZ42 to 22q12.1-22q13.1 (Blin et al,, 1992). For HZ32 the sensitivity could be improved by hybridizing the full-length repeat-containing clone in the presence of human competitor DNA, allowing its mapping to chromosome 8. In the case of HZ41 and HZ1 1 only ambiguous hybridization results could be obtained. Therefore, primers deduced from the unique parts of the inserts were synthesized and used to amplify small stretches of repeat flanking regions. Amplified DNA containing several fragments of different sizes were subjected to electrophoresis, blotted, and challenged with internal oligonucleotide probes. Bands of uniform size (184 bp) allowed us to map HZ11 to chromosome 11. The data pertaining to HZ41 are initially more difficult to interpret. The amplified region is also present in rodent DNA. Therefore, a principal identical banding pattern was obtained for
988
ZISCHLER
ET
AL.
Chromosomes CH -
123456
1
+
-I-
2
(+)
+
+
+
+ +
3
(+)
+
4
+
+
5
(+)
6
+
I
7
8
9
10
+
+
+
+
+++++-I+++++++++
+
+
+
+
(+)
+
+
+
+
(+I
+
+
(+)
+
+
+
1
(I)
+
+
+
+
+
+
+
+
+
+
I
+
+
7
I
+
+
8
(+I
+
9
I +
10
+
+
I
I
I
+ +
11
(+I
(+)
+
12
+
+
+
12
+
13
14
(+)
15
16
+
I
I
+
I
+
I
(+)
+
+
(+I
+++++++++I
+ +
11
Clone
+
I
I
17
18
+
+
+
I
I
+
I
19
20
+
22
+
+
+
+
(+I
+
I
11
32
41
42
+
+
+
+
+ +
+
+
+1
+
I
+
1
I +
+
+
I
+
I +
(+I +
+
X Y
-I-
I I
21
+
I
HZ
+I +
+
+
+?
+
i? +? (+)
I
+
+
+?
+
+
I
+
+
+
+
+
++
(+I
+
I
+
+
+
+
+
++
+ +
+
15
+
+
+
I
+
+
I
+
+
I
+
f
I
I
I
+
+
+
+
I
I
+
+?
I
+
i?
I
+
17
ii-z 32: 41: 42:
+
+?
+++++++++++++++++++++++ 16 31 38 54
33 42 33 42
40 30 80 50
46 46 77 31
57 43 43 57
54 38 54 54
38 38 54 38
50 8 50 36
54 54 46 31 0 54 46 31
0 44 56 56
46 38 46 46
58 50 58 42
67 50 50 33
% Discordancy FIG. 4. Chromosomal assignment present in CH; (+), human chromosome chromosome (fragments); nd, not tested. and Chinese hamster DNA) in 13 and absent. Seemingly unspecific signals in 7 with HZ42 is indicated by (+). Note
23 31 46 38
18 27 36 36
25 33 50 33
36 36 50 50
33 25 42 33
38 46 31 31
43 57 86 57
38 30 46 31
40 40 80 60
36 36 36 36
nd
+
nd
+
11
8
9
22
Chromosome Localization
of HZll, HZ32, HZ41, and HZ42 using a panel of somatic cell hybrids (CH). f, Human chromosome present in 110% of CH; I, presence of isozyme markers in the absence of cytogenetically identifiable For the calculation of discordancies, positive controls in lanes 1 and 18 as well as negative controls (rat 17 were not included. (+) was regarded as chromosome present, whereas I was regarded as chromosome the HZ41 amplification are marked with question marks. The weak hybridization signal obtained for CH that the inconsistencies in the case of HZ42 were clarified by additional experiments (see text).
each sample, with one clearly predominant band of the expected size (324 bp) in several samples. The same quantitative differences were observed after hybridization with an internal oligonucleotide probe. Based on these quantitative differences, HZ41 can be localized on chromosome 9. The homogeneous sizes of the repeat flanks obtained after PCR amplification reveal the monomorphy of the repeat flank. Different alleles due to different fusion partners for human/rodent cell lines were noticed after conventional blot hybridization. Evolutionary
+
+1
14
18
+? +?
13
16
l k? + -I?
+
+
+
Considerations
Simple tandem repetitive DNA is present in a wide range of animal and plant species in different phyla. The high number of repeat motif permutations has allowed establishment of highly informative fingerprints from all species tested so far (Epplen et al., 1991). We exam-
ined whether repeat flanking sequences have been conserved during evolution and whether they are located in the vicinity of longer (cac),/(gtg), stretches. DNAs of representative vertebrate classes were examined: guppy fish (Poecilia reticulata), newt (Necturus maculosus), snake (Boa constrictor), chicken (Gallus domesticus), mouse (Mus musculus), pig (Sus scrofa), cattle (Bos taurus), rhesus monkey (Macacca mulatta), gibbon (Hylobates syndactylus), orangutan (Pongo pygmaeus), gorilla (Gorilla gorilla), and chimpanzee (Pan troglodytes). Filters were probed with the repeat-free flank subclones of HZll, HZ32, HZ41, HZ42, and HZ127 at either low (2~ SSC/68”C) or intermediate stringency (0.5X SSC/ 68°C). Figure 5 displays the zoo blot hybridization patterns (0.5X SSC/68”C) of HZ11 (a) and HZ127 (b). The locus-specific subclones of HZ32, HZll, HZ127, and to a lesser extent HZ42 cross-hybridize to repetitive
a
989
(cac),/(gtg), LOCI
HYPERVARIABLE
b
FIG. 5. DNA of 1, guppy fish; 2, newt; 3, snake; 4, chicken; 5, mouse; 6, pig; 7, cattle; 8, rhesus monkey; 9, gibbon; 10, orangutan; 11, gorilla; 12, chimpanzee, and 13, man restricted with Hinfl and electrophoresed. Hybridizations at intermediate stringency (0.5X SSC/SS”C) are shown for probes HZ11 (a) and HZ127 (b). Since Hinfl cuts into the flanks of HZ127, only small fragments were detected in man. Molecular weight markers are indicated in kilobases of length.
elements in the P. reticulata and mouse genome. Most complex banding patterns with one or two main bands were obtained in the primates using probes HZAll and HZA42 and at low stringency using HZA32 (not shown). All could be correlated to (CAC),/(GTG), signals, as revealed by reprobing. The flank of HZ127 again produced many faint bands, some coinciding with (CAC),/(GTG), signals. Thus, all five flanking probe signals point to evolutionary conservation. In searching data banks, we did not find significant homologies of flanking subclones and repetitive DNA from fish, mouse, and cattle. Crosshybridizations were seen in different primate DNAs. In part, the cross-hybridizing targets of the repeat flanks seem to be linked to (cac),/(gtg), repeats, perhaps rendering them informative polymorphic probes. CONCLUSIONS
Thus far, only multilocus fingerprints produced by minisatellites have been dissected into their single constituents. Here, four individual hypervariable (cat),/ (gtg), probes were located on different human chromosomes. Despite possible cloning artifacts, the hypervariable loci contain simple repetitive DNA or are organized in higher order structures directly deducible from simple motifs. During evolution slippage may generate such structures as well as less well-defined mechanisms, which appear to result in higher order motifs. Thus, the repeat stretch can be organized homogeneously or it may contain various forms of substructures including pronounced G/C strand asymmetry. Monomorphic (cat),/ (gtg), loci tend to contain fewer simple repeat units than hypervariable loci. Among vertebrates the human locusspecific repeat flanks appear conserved to an unexpected
extent, even though they are located immediately adjacent to hypervariable elements with extremely high DNA turnover rates. This fact makes the locus-specific probes valuable tools for all individualization purposes. ACKNOWLEDGMENTS We thank Drs. Henn and Zang for providing meningioma Schempp and Nanda for primate DNA, and Drs. Tautz for comments. Multilocus oligonucleotides and monolocus subject to patent applications. This work was supported Stiftung.
cells, Drs. and Schartl probes are by the VW-
REFERENCES Blin, N., Klein, V., Zischler, H., and Epplen, J. T. (1991). Mapping to 22q12.1-q13.1 of a polymorphic locus-specific probe from a human fingerprint. Qtogenet. Cell Genet. 58: 2045. Cullinane, A. A., Rixon, F. J., and Davison, A. J. (1988). Characterization of the genome of equine Herpesvirus 1 subtype 2. J. Gen. Viral. 69: 1575-1590. Epplen, J. T. (1988). On simple repeated GATA/GACA sequences: A critical reappraisal. J. Hered. 79: 409-417. Epplen, J. T., Ammer, H., Epplen, C., Kammerbauer, C., Mitreiter, R., Roewer, L., Schwaiger, W., Steimle, V., Zischler, H., Albert, E., Andreas, A., Beyermann, B., Meyer, W., Buitkamp, J., Nanda, I., Schmid, M., Ntirnberg, P., Pena, S. D. J., Poche, H., Sprecher, W., Schartl, M., Weising, K., and Yassouridis, A. (1991). Oligonucleotide fingerprinting using simple repeat motifs: A convenient, ubiquitously applicable method to detect hypervariability for multiple purposes. In Burke, T., Dolf, G., Jeffreys, A. J., and Wolff, R. (eds.) “DNA-Fingerprinting: Approaches and Applications” (T. Burke, G. Dolf, A. J. Jeffreys, and R. Wolff, Eds.), pp. 50-69, BirkhluserVerlag, Basel. Hundrieser, J., Niirnberg, P., Czeizel, A., Metneki, J., Rothganger, S., Foelske, Ch., Zischler, H., and Epplen, J. T. (1992). Characterization of hypervariable, locus specific probes derived from a (CAC),/ (GTG), multilocus fingerprint in various Eurasian populations. Hum. Genet., in press.
990
ZISCHLER
Jeffreys, A. J., Wilson, V., Neumann, R., and Keyte, J. (1988). Amplification of human minisatellites by the polymerase chain reaction: Towards DNA fingerprinting of single cells. Nucleic Acids Res. 16: 10953-10971. Jeffreys, A. J., Wilson, V., and Thein, S. L. (1985). ‘minisatellite’ regions in human DNA. Nature 314: Mocarski, E. S., and Roizman, B. (1981). Site-specific quence of the herpes simplex virus genome: Domain features. Proc. Natl. Acad. Sci. USA 78: 7047-7051.
Hypervariable 67-73. inversion seand structural
Nakamura, Y., Carlson, M., Krapcho, K., Kanamori, M., and White, R. (1988). New approach for isolating VNTR markers. Am. J. Hum. Genet. 43: 854-859. Ntirnberg, P., Roewer, L., Neitzel, H., Sperling, K., Popperl, A., Hundrieser. J., Poche, H., Epplen, C., Zischler, H., and Epplen, J. T. (1989). DNA fingerprinting with the oligonucleotide probe (CAC),/ (GTG),: Somatic stability and germline mutations. Hum. Genet. 84: 75578. Perbal, New
B. (1988). York.
“A Practical
Guide
to Molecular
Cloning,”
Wiley,
ET
AL.
Stover, C. K., Vodkin, M. H., and Oaks, E. V. (1987). sion adaptors to clone antigen genes in Xgtll. Anal. 398-407. Studer, R., Kammerbauer, C., Zischler, Highly instable (GATA),-containing the cloning process. Electrophoresis Tautz, D., and Renz, itive components 4127-4137.
Use of converBiochem. 163:
H., and Hinkkanen, A. (1991). sequences of the mouse during 12: 153-158.
M. (1984). Simple sequences are ubiquitous repetof eukaryote genomes. Nucleic Acids Res. 12:
Thein, S. L., and Wallace, R. B. (1986). The use of synthetic oligonucleotides as specific hybridization probes in the diagnosis of genetic disorders. In “Human Genetic Diseases: A Practical Approach” (K. E. Davis, Ed.), pp. 33-50, IRL Press, Oxford. Weber, J. L. (1990). Human DNA polymorphisms based on length variations in simple-sequence tandem repeats. In “Genome Analysis 1: Genetic and Physical Mapping” (K. E. Davis and S. M. Tilghman, Eds.), pp. 159-181, CSH Laboratory Press, Cold Spring Harbor, NY.
Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989). ‘*Molecular Cloning: A Laboratory Manual,” CSH Press, Cold Spring Harbor, NY. Sarkar, G., and Sommer, S. (1990). More light on PCR contamination. Nature 347: 340-341.
Wells, R. D., Amirhaeri, S., Blaho, J. H., Collier, D. A., Hanvey, C. J., Hsieh, W. T., Jaworski, A., Klysik, J., Larson, J. E., McLean, M. J., Wohlrab, F., and Zacharias, W. (1987). Unusual DNA structures and the probes used for their detection. In “Unusual DNA Structures” (R. D. Wells and S. C. Harvey, Eds.), pp. 1-21, Springer Verlag, New York.
Schafer, R., Zischler, H., Birsner, (1988a). Optimized oligonucleotide Electrophoresis 9: 369-374.
Wong, Z., Wilson, V., Patel, I., Povey, S., and Jeffreys, A. J. (1987). Characterization of a panel of highly variable minisatellites cloned from human DNA. Ann. Hum. Genet. 51: 269-288.
U., Becker, A., and Epplen, J. T. probes for DNA fingerprinting.
Schafer, R., Zischler, H., and Epplen, J. T. (198813). (CAC),, informative oligonucleotide probe for DNA fingerprinting. Acids Res. 16: 5196. Skinner, D. Stallings, R. C. E., and repetitive 815.
a very Nucleic
(1977). Satellite DNAs. Bioscience 27: 790-796. L., Ford, A. F., Nelson, D., Torney, D. C., Hildebrand, Moyzis, R. K. (1991). Evolution and distribution of (GT), sequences in mammalian genomes. Genomics 10: 807-
Wyman, A. R., Wolfe, L. B., and Botstein, D. (1985). Propagation some human DNA sequences in bacteriophage lambda vectors quires mutant Escherichia coli hosts. Proc. Natl. Acad. Sci. USA 2880-2884.
of re82:
Zischler, H., Nanda, I., Schafer, R., Schmid, M., and Epplen, J. T. (1989). Digoxigenated oligonucleotide probes specific for simple repeats in DNA fingerprinting and hybridization in situ. Hum. Genet. 82: 227-233.