ELSEVIER
Genetic Analysis: Biomolecular Engineering 13 (1996) 49 58
BlomolecularEngineering
Subtractive hybridization, a technique for extraction of D N A sequences distinguishing two closely related genomes: critical analysis Olga D. Ermolaeva, Eugene D. Sverdlov* Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10 Miklukho-Maklaya, Moscow 117871, Russia Received 30 October 1995; revised 21 February 1996; accepted 15 April 1996
Abstract
The present status of genomic DNA subtraction techniques is reviewed. The advantages and disadvantages of the widely-used methods of genome subtrac.tion are discussed. Using the kinetic model of subtractive hybridization developed by us previously (Sverdlov and Ermolaeva, ]993; Sverdlov and Ermolaeva, 1994), the application of genome subtraction to various problems is analyzed. It is concluded that the technique should be further advanced based on subtraction of single-stranded DNAs. This strategy would enable one to efficientlyextract target sequences omitting the stage of genome simplification.
Keywords: Subtractive hybridization; Enrichment; Kinetics
1. Introduction
The identification of genomic differences underlying the phenotypic features (be it in single cells or in multicellular organisms) distinguishing objects under comparison is a fundamental problem of molecular biology and genetics. It requires efficient methods to compare the structure of genomes and the products of their expression. Such a comparison is important for molecular-evolutionary studies and for understanding genetic backgrounds of specific, race and individual differences, genetic deterrainants of hereditary diseases, malignant transformation and metastasis. Genomes can be physically compared at different levels, from cytogenetic analysis of chromosomes to direct comparison of genomic nucleotide sequences or their transcripts. A number of known methods for differential analysis of genomes and the products of their expression can be
* Corresponding author, Tel. & fax: + 7 095 330 6538. 1050-3862/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved
PH S1050-3862(95)00152-0
conditionally classified into two groups, namely, descriptive comparisons and isolational comparisons. The first group includes the methods that reveal differences upon direct comparison. It can be, in turn, subdivided into two large subgroups: (1) Analysis of differences in definite genomic loci. An example is restriction fragments length polymorphism (RFLP) analysis (McKusick, 1991). (2) Gross analysis of the features distinguishing whole genomes. This subgroup includes DNA fingerprinting (Tautz, 1990), arbitrary primed PCR (AP-PCR, RAPD) (Williams et al., 1990; Welsh and McClelland, 1990) and other methods. The second group of methods is aimed at physical isolation and analysis of the DNA fragments representing the genomic differences. The methods of subtractive hybridization belong to this group. The subtractive hybridization becomes more and more important among the methods used for isolation of the sequences distinguishing closely related genomes, or the mRNAs present in one type but absent from another type of cells. This method enables one to
50
O.D. Ermolaeva, E.D. Sverdlov / Genetic Analysis: Biomolecular Engineering 13 (1996) 49-58
construct differential libraries of genomic DNAs or their transcripts directly and thus to characterise the differences at the structural level that is necessary for further functional analysis. For the last 2 or 3 years the method was considerably advanced (see Myers (1993) for brief review) and used to obtain important results reviewed in part below. At the same time we believe that the potentialities of the method are still far from being exhausted, and it could be even more efficient provided that some features of the reassociation kinetics for complementary DNA strands are taken into account. To assess the most promising lines of further refining of the method we analysed the reassociation kinetics both in most widely used and in possible but less popular strategies of subtraction. The present work is a review only in the sense of demonstrating fundamental principles underlying current subtractive hybridization techniques. No attempt has been made to present a complete review of the literature which is vast and rapidly growing. Instead, we have analysed only the key pioneer works and the most important contributions having radically improved the method. Although we realise that a critical review is an object of criticism per se, we hope that it will attract the attention to the problem and help to find optimum solutions.
remain single-stranded. The reassociated fragments common for driver and tracer are discarded, and the remaining DNA enriched in target sequences is cloned and analysed.
3. Three stages in development of the method
The principles of subtractive hybridization have apparently first been put into practice by Bautz and Reilly (1966). These authors used the deletion of the rII region of T4 bacteriophage in order to isolate the corresponding mRNA using hybridization of the mutant DNA with mRNAs transcribed on the wild type T4 r + DNA. In conclusion they probably too modestly pointed out that '... This deletion method might be successfully applied to the isolation of operon- or even gene-specific bacterial messengers with but little improvement of our technique'. Within the span of 30 years passed since that time the method of subtractive hybridization went through three periods which are outlined below. We considered only applications of the method to complex mammalian genomes except the works where the technique was modified fundamentally.
Deletion enrichment
2. A brief description of the method
We shall use the following terms: target, the DNA sequence(s) to be isolated; tracer, DNA (genomic DNA, cDNA etc.) containing the target; driver, closely related DNA, distinct from tracer in target sequences. Ideally, but not always, the target sequences are present in tracer and absent from driver. In principle, subtractive hybridization can be used to extract targets of different classes. There are at least three of them: absolute target, present in tracer but absent from driver; amplified target, present both in tracer and driver but with different copy numbers, this is typical when subtracting cDNA libraries of two closely related cell types to detect differences of the gene expression levels; non-identical target, present both in tracer and driver but in not fully identical forms. At the first stage of subtractive hybridization, tracer and driver DNAs are fragmented, e.g. with restriction endonucleases. Fragmented tracer DNA is mixed with a great excess of driver DNA, and the mixture is first heated to denature DNA and then cooled to reanneal the denatured fragments. The basic idea of subtractive hybridization is that tracer DNA will primarily reassociate with excess driver DNA while target sequences having (ideally) no counterparts in driver will inevitably reassociate with each other, or (when the tracer is in a single-stranded form)
female DNA
male DNA
I Mbol digestion
sonication
~,~
100
~ I s
1 Denature Mix Anneal olate ds D7
~ /
/
/
~---~---'~
Type 1
~ •,--__
Type 2 Type 3
~ ~,
BamHI
BamHI
pBR322
Ligate Clone •
Y-enriched library
Fig. 1. Genome subtraction scheme used by Lamar and Palmer (1984).
51
O.D. Ermolaeva, E.D. Sverdlov / Genetic Analysis: Biomolecular Engineering 13 (1996) 4 9 - 5 8
S a u 3 A digested tracer 1
Anneal Ligate
[ ~ ,
_
~ nonphosphorylated adaptor strands rrr~,
Remove o[igonucleolides
L ,
r.,
Denature, reanneal
Sheared, biotinylated driver
"
hybrids ss and ds tracer
================================================ .................... ===================== ..................................................................
Add streptavidin, extract
iiii ....................
ss and ds driver
~ ~
> hybrids and driver
ra.
ss tracer
~| . I /
ds tracer Taq polymerase, dNTP
Fill-in ends ss tracer
filled-in ds tracer PCR
ss tracer
~.,,/ 'k
primer
mm
~r' Amplified ds fraction
The resulting double-stranded DNA included three types of molecules (Fig. 1); only one of them had both chains consisting of initial MboI tracer fragments. The fraction of the reassociated initial tracer was enriched in fragments of Y chromosome and differed from other fractions in that it contained sticky MboI ends at both sides of the fragments. The fragments with sticky ends were selectively cloned into a pBR322 vector cleaved with restriction endonuclease BamHI. A library of 25 000 recombinant clones containing genomic tracer fragments was obtained. Approximately 13% of the clones included Y chromosome DNA. Earlier it was shown (Bishop et al., 1983) that about 20% of the Y chromosome sequences do not cross-hybridize with DNA of other chromosomes while 80% of the sequences are homologous in males and females. Therefore only 20% of the chromosome-specific fragments could be enriched. Since the size of the Y chromosome constitutes about 1.68% of the 3 x 109 bp genome (Thennan, 1986), the frequency of occurrence of the 20% Y chromosome specific fragments in an unenriched genome library should be 0.33%. The experimental value of 13% observed for the subtracted library thus means that the enrichment was about 40-fold.
,,,t~ Tracer amplicon
Driver amplicon ~
Fig. 2. Genome subtraction sch,eme based on PCR-amplification of double-stranded tracer (Lisitsyn et al., 1993a).
(in excess)
Ligate to d~phosphpryl~ted aoaptor
~ x , melt.
anneal
3.1. Pre-PCR stage ds
Lamar and Palmer (1984) probably pioneered in applying the method of subtractive hybridization to complex genomes. They managed to obtain a library enriched in mouse Y clhromosome fragments. The DNA of female BALB/c mice was randomly cut into fragments, while the DNA of males was cleaved with restriction endonuclease MboI. Fragments of the former and the latter DNA were mixed at a ratio of 100:1, denatured and then reanne, aled up to Cot = 1320 (where Co is molar concentration of nucleotides and t is time of reaction) where reassociation is 95% complete. According to our definitions, the female and male DNA played here the role of driver and tracer, respectively. The calculated reassociation rate constant 1.44x 10 -2 [(mole nucleated) ]-l.sec -1] was about 30 times greater than that cited el,;ewhere (Britten and Davidson, 1985). This increase is explainable because Lamar and Palmer used a special buffer containing 2M ( N H 4 ) 2 S O 4 which increased the rate of reassociation approximately 50-fold, and this rate is crucial for successful subtraction (see Section 4 and Section 5).
slranos
ss
tracer
tracer
ds driver
I ..................................
hybrids
~ ~
ss driver
-in e n d s
, I
Exponential
amplification
............................................
I
Linear
I
amplification
.............................................. No amplification
No amplification
-'~"-Digest s s DNA with m u n t ~ b e a n ~ nuclease, PCR amplify Difference product enriched in target
~ Digestwithresa'ietionendonaclease Clone and analyze Fig. 3. Genome subtraction scheme using repeating amplification rounds (Lisitsyn et al., 1993b).
52
O.D. Ermolaeva, E.D. Sverdlov / Genetic Analysis: Biomolecular Engineering 13 (1996) 49-58
mOther
8e
< Z E)
lique
7O
rget
6G 5C 4O
0
30
70 60 50 40 30
o 20
N--
o
8o
<( Z a to 0
20
~
lO o
o
1st Round
2nd Round
3rd Round
I st Round
2nd Round
3rd Round
4th Round
Fig. 4. Percentage of target, unique and other DNA sequences after the 1st, 2nd, and 3rd rounds of hybridization under genome subtraction conditions used by Lisitsyn et al. (1993b). Double-stranded driver and tracer (traditional subtraction) at concentrations of 10 and 0.1 mg/ml, respectively; 20 h hybridization after 10-fold preliminary simplification of the genome (a) and without preliminary simplification (b). This work revealed the basic strategy used in subsequent work before and after the development of PCR. The strategy was built on the hybridization of double-stranded driver and tracer to use the reannealed double-stranded fractions enriched in target sequences for further operations. Some authors still use variations of the Lamar and Palmer method to separate the fraction of self-reannealed tracer fragments by means of specific ends recovered after the reassociation. Below we shall show that this strategy is not optimum, but for the moment we simply stress that the efficiency of any strategy crucially depends on the rate of reassociation. Lamar and Palmer increased this rate using a special buffer. The authors of another work (Kunkel et al., 1985) used phenol-enhanced reassociation technique capable of increasing the D N A reassociation rate three or even four orders of magnitude. This work played an important role in the localization of the Duchenne muscular dystrophy gene. D N A from a 49XXXXY lymphoid cell line was used as tracer, and driver D N A was taken from a patient with a 3500 kb deletion (van Ommen et al., 1986) in the short arm of the X chromosome. The strategy was exactly the same as in the work of Lamar and Palmer including the use of the same restriction endonuclease ( M b o I ) for the digestion of tracer. The library obtained contained about 5% target D N A specific fragments that corresponded to an 18-fold enrichment. Hence, despite the high reassociation rate, the enrichment of the double-stranded fraction was low. Within the framework of the same strategy, but using a different restriction endonuclease to fragment the tracer, a library enriched in the sequences deleted from the genome of a patient with choroideremia, deafness and mental retardation (Nussbaum et al., 1987) was obtained. The size of the deletion in this case was probably about 1.25 x 1 0 6 bp. The library contained about 3% of the clones of interest, whereas
the initial content of the corresponding sequence (repeated 4 times in the diploid genome of 6 × 1 0 9 bp) would be 0.08%. The enrichment achieved was thus about 36-fold. A slightly modified method of reassociation was employed for the cloning of multiple amplified sequences from human neuroblastoma (Shiloh et al., 1987) and gastric carcinoma cell lines (Mor et al., 1991). It was assumed that the high copy numbers of the amplified sequences would result in an increase in the rates of their reassociation. Indeed, the content of the desired clones was rather high in the libraries obtained, although the published data were insufficient to assess the extent of enrichment. The same strategy could probably be used, not only for comparison of DNAs from different individuals belonging to the same species, but also for analysis of differences between closely related species. For example, we used the same strategy in an attempt to identify human-specific sequences by subtractive hybridization of human and chimpanzee DNAs, that resulted in the isolation of a human-specific repeat (Lisitsyn et al., 1990). The enrichments of target sequences achieved in the works considered above were rather low, at least markedly lower than could be expected from the driver to tracer ratios. It is, however, not surprising since theoretical enrichment values can be obtained only at very high degrees of driver reassociation. Naturally enough, there were attempts to improve the enrichment, and the most successful improvement was the use of PCR to amplify the enriched fraction. 3.2. P C R - b a s e d subtraction
It appears that two groups independently, and at about the same time, began to use PCR for subtractive hybridization of genomes. Those were the groups of Frederick Ausubel in Boston (Straus and Ausubel, 1990) and Michael Wigler in Cold Spring Harbor
O.D. Ermolaeva, E.D. Sverdlov / Genetic Analysis: Biomolecular Engineering 13 (1996) 49-58
53
Table 1 Comparative target enrichmentvalues for the subtraction of human genomes according to subtraction schemes 3 (single-strandedcomplementary driver and tracer) and 1 (double-strandeddriver and tracer) Time (h)
Enrichmentin ss fraction ssTracer:ssDriver(Case 3)
Enrichment in ds fraction dsTracer:dsDriver(Case 1)
1
4.0 6.3x101
2.4 5.0 7.4 9.7 11.9 13.9 15.8 17.6 19.3 21.0 21.7
3 5 7 9 11 13 15 17 19 20
1.0xl0 3 1.6xl0 4
2.5x105 3.9x106 6.2x107 9.7x108 1.5x101° 2.4x1011 9.6x101~
Subtraction conditions were taken from the paper of Lisitsyn et al. (1993b). Concentrations of driver and tracer were 10 and 0.1 mg/ml, respectively,after a 10-fold preliminary simplification of the genome. Laboratory (Wieland et al., 1990). Wigler's group was working with the most complex system, the subtraction of human genomics. They performed the subtraction of double-stranded tracer vs. also double-stranded driver. The enriched target in the work of Straus and Ausubel (1990)" was obtained from the reannealed double-stranded fraction, while Wigler's group worked with the enriched single-stranded fraction having had not enough time to reassociate. As shown below, this difference in strategy is of no importance insofar as the extent of enrichment is concerned. A number of consecutive cycles of subtraction were used in these works. An example of consecutive subtractions is illustrated in Fig. 2 which presents the scheme of one of our experiments (Lisitsyn et al., 1993a). To selectively separate 1:racer from driver (Wieland et al., 1990) or tracer from driver and the hybrids formed (Straus and Ausubel, 1990; Lisitsyn et al., 1993a), biotinylation of tracer or driver was used in all these studies. Finally, the groups used PCR to amplify the enriched tracer after reiterated subtraction rounds. In the work of Wieland et al. (1990) the linkers for PCR were ligated prior to the subtraction, whereas Straus and Ausubel (1990) carried out the ligation after all of the subtraction rounds were completed. In contrast to the two other schemes, we (Lisitsyn et al., 1993a) elaborated a protocol using PCR-linkers constructed in such a w~Ly that after being ligated to tracer they formed 5'-protruding single-stranded tails. This was done by using dephosphorylated adaptors. In this way only one of two adaptor chains could be covalently bound during the ligation to the D N A fragments which contained 5'- phosphate groups. Therefore, only reannealed double-stranded tracer enriched in target sequences contained these tails at both ends. After these ends were filled-in with D N A polymerase, the resulting tailed tracer molecules could
be exponentially amplified by PCR using primers corresponding to the adaptor oligonucleotide covalently bound to the 5'-ends of the tracer fragments. This experimental variation was an important difference which allowed us to amplify the reannealed double-stranded tracer fragments selectively, discriminating them from single-stranded, hybrid and driver molecules. In principle, the ligation of linkers after subtraction (Straus and Ausubel, 1990) should have the same effect. The PCR-linkers used in our work played the same role as M b o I ends of the renatured tracer in the work of Lamar and Palmer (1984) that also allowed one to discriminate tracer from other components of the renaturation mixture. The same tailing procedure was further used in the most updated method of subtractive hybridization (Lisitsyn et al., 1993b). Such an innovation markedly increased the extent of enrichment. In particular, in the work of Wigler's group the target D N A fragments from the Duchenne muscular dystrophy locus were enriched approximately 100-fold by subtraction of normal human D N A against D N A with a deletion of about 1 x 106 b.p. in this locus. This method was also successfully used to obtain a genomic library enriched in sequences deleted in lung cancer (Wieland et al., 1992) with an extent of enrichment as high as 450. Despite all the improvements achieved in these procedures, the absolute values of enrichment were still too low to detect small deletions in organisms with complex genomes like humans. 3.3. Modern advanced PCR-subtraction
An essential breakthrough in the problem of enrichment was recently achieved by Wigler's group (Lisitsyn et al., 1993b), who used a more advanced scheme of subtraction (Fig. 3). The sophisticated
O.D. Ermolaeva, E.D. Sverdlov Genetic Analysis: Biomolecular Engineering 13 (1996) 49-58
54
a
b
lOO
Target
90
< Z "-d +,,,a
0
100
90
80-
,~
8O
70*
Z
7o
60 *
6O
50*
..a 50
o
40*
40
30 ÷
o
20*
O.
[
0
L,
2O
f'
•
I"
40
60
ao
3O 20
Unique
tO~j
o~
Target
t
~00 Time (h)
,{
~20
1OF
i 0o'
Unique 50
1O0
150
200
Time (h) Fig. 5. Percentage of target and unique DNA sequences after the 1st round of hybridization according to subtraction scheme 3 (single-stranded complementary driver and tracer). Subtraction conditions used by Lisitsyn et al. (1993b). Concentrations of driver and tracer were 10 and 0.1 mg/ml, respectively, after a 10-fold preliminary simplification (a) and without preliminary simplification of the genome (b).
technique was named Representational Difference Analysis or RDA. First of all (and probably most importantly), selective amplification of driver and tracer DNA was introduced in order to reduce the complexity of the genomes to be compared. To this aim driver and tracer DNAs were cleaved separately with a restriction endonuclease (BamHI, -BglII or HindIII). The fragments obtained were ligated to oligonucleotide adaptors and then PCR-amplified using the primers complementary to the adaptors. After 20 amplification cycles, fragments below 1 kb in size were dominant. The resulting driver and tracer were more or less random sampling of the initial pool of fragments and represented simplified genomes. The extent of simplification (considered here as a reduction in complexity) varied depending on the DNA cleavage procedure: for BamHI, BgllI and HindIII restriction endonucleases it was 55, 13 and 8, respectively. The simplified fragmented genomes, termed amplicons, were used for subtraction. Apart from lower complexity, they were distinguished from the initial mixture by two additional important characteristics: (1) the sizes of individual fragments were more or less homogenous, i.e. they were close to each other; (2) they were fragments that can be amplified by PCR. Which of the three characteristics is of primary importance is unclear. We think that the last characteristic might be not the least. In view of that we would prefer to use the term selecton for the result of the initial amplification. After the amplification the adaptors in the driver and tracer amplicons were cut out by restriction nucleases, and the resulting tracer fragments were religated to new adaptors as in our earlier scheme (Fig. 2). The resulting tracer fragments differed from those of driver by their
Y-protruding tails. Subsequent steps, including subtraction and filling-in the protruding ends, coincided with our previous scheme (Lisitsyn et al., 1993a). But, instead of removing driver the authors (Lisitsyn et al., 1993b) directly amplified the reannealed tracer molecules, since only this fraction contained structures with both ends complementary to the primer used. Single-stranded driver fragments incapable of exponential amplification as well as unreannealed tracer fragments were cleaved by a nuclease specific for single-stranded DNA. The double-stranded fragments that persisted after the treatment were amplified again, and the adaptors were then cleaved away with a restriction endonuclease. After this, new adaptors constructed analogously, but having alternative primary structures, were connected. The newly obtained tracer was again subtracted with a new portion of driver, and the procedure was repeated. Each new round of subtraction produced additional enrichment in target sequences, not only by virtue of the subtraction itself, but also due to the higher reassociation rate of the enriched fragments in accord with the second order kinetics. It should be noted that this kinetic component should be equally effective in all the protocols using repeated cycles of subtraction to enrich the double-stranded fraction (e.g. Straus and Ausubel, 1990; Lisitsyn et al., 1993a). The possibility of taking advantage of the second order kinetics of reassociation w a s also mentioned in the work of Wigler's group (Wieland et al., 1990). Taken together, all these expedients and advancements made it possible to obtain very high degrees of enrichment. The degree of target enrichment in a model system was claimed to be 'greater than 5 x 106 fold from the starting material, and.., about 4 x 105 fold from amplicons' (Lisitsyn et al., 1993b).
O.D. Ermolaeva, E.D. Sverdlov / Genetic Analysis: Biomoleeular Engineering 13 (1996) ,19 58
The published data demonstrated the degree of enrichment much higher than that obtained so far by all other methods. It is already quite sufficient for comparison of complex genomes. 3.4. S o m e recent developments
Since its appearance, the R D A method was successfully used for identification of a homozygous deletion in pancreatic carcinoma (Schutte et al., 1995), and for cloning probes that detect DNA loss and amplification in tumors (Lisitsyn et al., 1995). A new technique, genetically directed representational difference analysis (GDRDA), was developed for specifically generating genetic markers linked to the traits of interest (Lisitsyn et al., 1994). In 1994, Rosenberg et al. (1994) described a method, based on subtractive hybridization - R F L P subtraction designed to purify smaller restriction fragments from a complex genome if they do not have a counterpart of the same size class in a competing genome. Recent reviews of the R D A and related techniques have been published in 1995 (Lisitsyn and Wigler, 1995; Jonsson and Weissman, 1995). The examples above clearly demons;trate successful application of R D A to many systems. 3.5. Problems and questions
Despite evidently successful application of R D A and other similar versions of the subtraction a number of problems remain to be solved. The first important question arising on examination of the papers considered above is to what extent the selective amplification used by the authors is in fact representative of the genome? The selectively amplified material obtained could, for example, lack fragments although matching the range of lengths < 1 kb but less effectively amplified due to some reasons such as peculiarities in secondary structure. In view of the known 'mosaic' structure of the human genome composed of the blocks with different G + C content (Bernardi, 1989) the question is whether the amplicons equally represent different blocks of the genome. This would mean that the isolation of small genome differences using this technique might in some cases be a matter of luck. The next question we will discuss is intimately related to the first one: what is the main role of the simplification procedure? Could it be the selection of those fractions for which the subsequent renaturation is somehow facilitated? This possibility can be easily conceived. If polynucleotide chains with no internal secondary structure are advantageous for amplification and selection during PCR, it seems reasonable to assume that they will also better reassociate. If the reassociation rate constant for 1:his fractions is twice as large
55
as an average value, then, after two rounds of subtraction the enrichment of target for such amplicons will be an order of magnitude higher than on the average (see below). This assumption is rather likely, since the genomic D N A and even cDNAs are known to contain very slowly reannealing fractions (Fargnoli et al., 1990). Limitations of R D A were also discussed in the work by Schutte et al. (1995). The authors used this method to detect DNA sequences deleted in tumors. They indicated in particular that R D A identifies a simple loss of heterozygosity (that is deletion of only one allele) only if the restriction fragments from the region of interest produced by a restriction endonuclease used for representation are different between driver and tracer DNAs in such a way that the smaller fragment is deleted in driver and therefore is present only in the representation of tester. Moreover, due to size selection of the fragments in the process of multiple PCR cycles even those homozygously deleted in driver DNA region can be missed if restriction fragments from the corresponding region of tracer D N A are too large to be selected during the representation by the PCR. This problem can probably be solved by means of performing the R D A with several restriction enzymes, but it can considerably complicate the technique. Finally, we pose the last and the most important question. Despite all the differences, the methods described above were based on the same strategy, subtraction of double-stranded driver and tracer. The question is whether this strategy is t h e optimum. To answer the question it was necessary to examine the kinetic aspects of various methods of subtraction and to compare corresponding attainable degrees of enrichment.
4. Kinetic models of subtractive hybridization
Earlier we described kinetic models of subtractive hybridization processes and the SUBTRACT software designed to simulate them (Sverdlov and Ermolaeva, 1993; Sverdlov and Ermolaeva, 1994; Ermolaeva and Wagner, 1995). The models are briefly summarized below. Here the following designations will be used: (1) D, T, U, W, molar concentrations of fragments of driver, fragments of tracer that are present in driver, fragments of target in tracer, and fragments of target in driver, respectively; (2) d, s, superscripts indicating double-stranded and single-stranded DNA, for example U ~, T a, DS; (3) US(t), DS(t), concentrations as a function of time; (4) U0, To, initial concentrations of single-stranded DNA; (5) Es, E a, the enrichment in single-stranded and double-stranded DNA, respectively;
56
O.D. Ermolaeva, E.D. Sverdlov / Genetic Analysis: Biomolecular Engineering 13 (1996) 49 58
(6) R, M - 1 "s - 1 , reassociation rate. R is approximately 106 M - 1 "s 1 for fragments of about 500 nucleotides in length, in 0.18 M NaC1 at an optimum temperature of 25°C below the melting temperature (Britten and Davidson, 1985). All the methods described above were built on the same strategy of subtracting double-stranded driver and tracer. Fragments of double-stranded driver and double-stranded tracer D N A are mixed at D o >>T 0, denatured and then allowed to reassociate. This will initiate three processes: (1) Reannealing of tracer, the fragments of which reassociate with each other and with driver chains; (2) Reassociation of driver, the fragments of which reassociate with each other and with tracer chains; (3) Reassociation of target with the formation of double-stranded fragments. Assuming that the rate of driver-driver, tracer-tracer and driver-tracer duplex formation is defined by a second-order equation, the kinetics of the subtractive hybridization process with target present in the tracer but absent from the driver is defined by the following system of differential equations: dDS( t ) _ _ _ dt
dOd(t) dt
- RDS(t)DS(t)
dT"(t)
RTS(t)T,(t) - RDS(t)TS(t)
dt
drd(t) dt
RDS(t)DS(t) - RD,(t)TS(t)
- RT'(t)T'(t)
Ds(O) = Do,Dd(O) = O, TS(O) = To, Td(O) = O. dUS(t) _ _ _ dt
5. An analysis of different subtraction schemes and possible new ways of genome subtraction
RUS(t)U,(t)
dUd(t) dt
RU'(t)US(t)
u~(o) =
Uo,U~(O) = o.
E , ( t ) = US(t)" ts(t) ' Ed(t)-
Accordingly, we developed the models for subtracting genomes and cDNA libraries for all three classes of targets (absolute, amplified and non-identical) using the following five subtraction schemes: (1) double-stranded tracer and double-stranded driver (the most commonly used scheme); (2) single-stranded tracer and double-stranded driver; (3) single-stranded complementary tracer and driver; (4) double-stranded tracer and driver unable to self-reassociate; (5) double-stranded tracer and single-stranded driver. All the models are based on the assumption that the rate of driver-driver, tracer-tracer and driver-tracer duplex formation is defined by a second-order equation. The models take into account variable factors affecting the experimental results, such as the ionic strength of the solution, the incubation temperature and the size of D N A fragments etc. The system of equations was solved for the case when the concentration of driver far exceeded that of tracer (Do >> To). The concentrations of the D N A mixture components and corresponding enrichment values were deduced as functions of time. Formulas for the absolute target enrichment in single- and double-stranded fractions are presented in Appendix A. To facilitate calculations with the formulas obtained, we have developed a computer program simulating the process of subtractive hybridization (Ermolaeva and Wagner, 1995). The program has a friendly interface convenient enough even for unexperienced users, and it permits one to choose optimum reaction conditions and the optimum subtraction strategies.
Ud(t) Ta(t ).
It should be mentioned that the subtraction of ds against ds D N A was also analysed by Milner et al. (1995) and similar formulas were obtained. But subtraction of double-stranded driver and tracer is only one of all possible subtraction schemes. In particular, in the case of complementary single-stranded tracer and driver the target sequences have no counterparts and will be therefore greatly enriched after hybridization and the removal of duplex fragments.
We analysed various subtraction schemes used to find small differences between complex genomes. The results for schemes 1 (double-stranded driver and tracer, traditional method) and 3 (complementary single-stranded driver and tracer) has been compared. It was assumed that the genomic D N A contained 70% unique sequences and the target was a single-copy sequence. The concentrations of driver and tracer as well as the hybridization conditions reproduced those used by Lisitsyn et al. (1993b), and the genome was assumed to be 10-fold simplified prior to the hybridization. According to the calculations for scheme 1, the enrichment should be about 20 and the content of the target should be below 10 4% (Fig. 4a) after 20 h of hybridization in the first subtraction round. In the second round, also after 20 h of hybridization, these values should increase to 6500 and 0.1%, respectively. After the third round of hybridization the content of
O.D. Ermolaeva, E.D. Sverdlov / Genetic Analysis: Biomolecular Eng&eering 13 (1996) 49-58
target should be as high as 46% of all the tracer sequences. Similar calculations were also done for initially unsimplified genomes (Fig. 4b). In this case four rounds of hybridization were sufficient to obtain a library containing 61% clones with target sequences• In contrast, enrichment of target by scheme 3 (singlestranded complementary driver and tracer) exponentially increases with time (see formulas in Appendix A and Table 1). Therefore, under the same conditions of hybridization one might obtain a library of 100% target sequences just in one round of subtraction (Fig. 5a). Moreover, the same enrichment for unsimplified genomes could be theoretically achieved after 150 h of hybridization (Fig. 5b). However, this time interval might be considerably shortened by using accelerated reassociation in the presence of cation detergents (Pontius and Berg, 1991) or otherwise (Kohne et al., 1977)•
(2) Single-stranded driver;
6. Conclusion
Ea(t) ~ 2f ~ ° when t
The kinetic models of subtracted hybridization that we have developed allowed us to analyse different schemes for genome subtraction and to predict their efficiency• Our calculations clearly demonstrate that the hybridization of single-stranded driver and tracer is kinetically much superior to other schemes of genome subtraction• In principle, this strategy would make unnecessary the preliminary simplification of genomes that seemed inevitable, otherwise (Lisitsyn et al., 1993b).
Acknowledgements We are thankful to Dr. B. Glotov for help in the translation of the manuscript.
Appendix A: Final formulas for the enrichments of the targets after subtraction according to different schemes: (1) double-stranded tracer and double-stranded driver (the most commonly used scheme); ES(t)= f [ 1 + RDot ~ _Vo. Itl +R~dot ) ' where f : T0' Do E~(t) ~ Too when t ~ ~ . ~[ 1 + RDot "~
Ed( t ) ~ f ~-
1o
when t
oo
tracer
and
57
double-stranded
E~(t) = f (1 + RDot); Es(t) --->~ when t ~ ~ . (3) single-stranded complementary tracer and driver; ES(t) = fem)o'; Es(t) --, ~ when t ~ ~ . (4) double-stranded tracer and driver unable to selfreassociate; feRDo r
E ' ( t ) - 1 + RUot' ES(t) ~ ~ when t ~ ~ . Ea(t) = f2
2RDot (1 + RUot)(1 - e-2RD°'); oo.
(5) double-stranded tracer and single-stranded driver. Ea(t ) =S2.
RDot (1 + R Uot )(1 - e - RD°@
Do Ed(t) ~ f - - when t --+ ~ .
To
References Bautz EKF, Reilly E. Gene-specific messenger RNA: isolation by the deletion method. Science 1966; 151: 328-330. Bernardi G. The isochore organization of the human genome. Ann Rev Genet 1989; 23: 637-661. Bishop E, Guellaen G, Geldwerth D, Voss R, Fellous M, Weissenbach J. Single-copy DNA sequences specific for the human Y chromosome. Nature 1983; 303: 831-832. Britten R J, Davidson EH. In: Hames BD, Higgins SJ (eds): Nucleic Acid Hybridisation. IRL Press, Oxford-Washington DC 3-14, 1985. Ermolaeva OD, Wagner MC. Isolation of DNA sequences deleted in lung cancer by genomic difference cloning. CABIOS 1995; 1 I(4): 457-462. Fargnoli J, Holbrook N J, Fornace A J, Low-ratiio hybridization subtraction. Anal Biochem 1990; 187: 364-373. Jonsson J J, Weissman SM. From mutation mapping to phenotype cloning. Proc Natl Acad Sci USA 1995; 92: 83-85. Kohne DE, Levinson SA, Byers MJ. Room temperature method for increasing the rate of DNA reassociation by many thousandfold: the phenol emulsion reassociation technique. Biochemistry 1977; 16: 5329. Kunkel LM, Monaco AP, Middlesworth W, Ochs Hans D, Latt SA. Specific cloning of DNA fragments absent from the DNA of a male patient with an X chromosome deletion. Proc Natl Acad Sci USA 1985; 82: 4778-4782. Lamar E, Palmer E. Y-encoded, species-specific DNA in mice: evidence that the Y chromosome exists in two polymorphic forms in inbred strains. Cell 1984; 37: 171-177.
58
O.D. Ermolaeva, E.D. Sverdlov / Genetic Analysis: Biomolecular Engineering 13 (1996) 49-58
Lisitsyn NA, Lisitsina NM, Dalbagni G, Barker P, Sanchez CA, Gnarra J, Linehan WM, Reid BJ, Wigler MH. Comparative genomic analysis of tumors: detection of DNA losses and amplification. Proc Natl Acad Sci USA 1995; 92:151 155. Lisitsyn N, Wigler M. Representational difference analysis in detection of genetic lesions in cancer. Methods Enzymol 1995; 254: 291-304. Lisitsyn NA, Segre JA, Kusumi K, Lisitsyn NM, Nadeau JH, Frankel WN, Wigler MH, Lander ES. Direct isolation of polymorphic markers linked to a trait by genetically directed representational difference analysis. Nat Genetics 1994; 6: 57-63. Lisitsyn NA, Rosenberg MV, Launer GA, Wagner LL, Potapov VK, Kolesnik TB, Sverdlov ED. A method for isolation of sequences missing in one of two related genomes. Mol Gen Microbiol Virusol 1993a; 3:26 29. Lisitsyn N, Lisitsyn N, Wigler M; Cloning the differences between two complex genomes. Science 1993b; 259:946 951. Lisitsyn NA, Launer GA, Wagner LL, Akopyanz NS, Martynov VI, Lelikova GP, Limborska SA, Polukarova LG, Sverdlov ED. Isolation of rapidly evolving genomic sequences: construction of a differential library and identification of a human DNA fragment that does not hybridize to chimpanzee DNA. Biomed Sci 1990; 1: 513-516. McKusick VA. Current trends in mapping human genes. FASEB J 1991; 5: 12-20. Milner J J, Cecchini E, Dominy PJ. A kinetic model for subtractive hybridization. Nucleic Acids Res 1995; 23:176 187. Mor O, Messinger Y, Rotman G, Bar-Am 1, Ravia Y, Eddy RL, Shows TB, Park JG, Gazdar AF, Shiloh Y. Novel DNA sequences at chromosome 10q26 are amplified in human gastric carcinoma cell lines: molecular cloning by competitive DNA reassociation. Nucleic Acids Res 1991; 19:117 123. Myers MR. The pluses of subtraction. Science 1993; 259: 942-943. Nussbaum RL, Lesko JG, Lewis RA, Ledbetter SA, Ledbetter DH. Isolation of anonymous DNA sequences from within a submicroscopic X chromosomal deletion in a patient with choroideremia, deafness and mental retardation. Proc Natl Acad Sci USA 1987; 84:6521 6525. Pontius BW, Berg P. Rapid renaturation of complementary DNA strands mediated by cationic detergents: a role for high-probability binding domains in enhancing the kinetics of molecular assembly processes. Proc Natl Acad Sci USA 1991; 88: 8237-8241.
Rosenberg M, Przybylska M, Straus D. TI RFLP subtraction: A method for making libraries of polymorphic markers. Proc Natl Acad Sci USA 1994; 91: 6113-6117. Schutte M, Da Costa LT, Hahn SA, Moskaluk C, Hoque ATMS, Rosenblum E, Weinstein CL, Bittner M, Meltzer PS, Trent JM, Yeo C J, Hruban RH, Kern SE. Identification by representational difference analysis of a homozygous deletion in pancreatic carcinoma that lies within the BRCA2 region. Proc Natl Acad Sci USA 1995; 92:5950 5954. Shiloh Y, Rose E, Colletti-Feener C, Korf B, Kunkel LM, Latt SA. Rapid cloning of multiple amplified nucleotide sequences from human neuroblastoma cell lines by phenol emulsion competitive DNA reassociation. Gene 1987; 51: 53-59. Straus D, Ausubel FM. Genomic subtraction for cloning DNA corresponding to deletion mutations. Proc Natl Acad Sci USA 1990; 87: 1889-1893. Sverdlov ED, Ermolaeva OD. Kinetic analysis of subtractive hybridization of transcripts. Bioorg Khim 1994; 20: 506-514. Sverdlov ED, Ermolaeva OD. Subtractive hybridization. Theoretical analysis and a principle of the 'trapper'. Bioorg Khim 1993; 19: 1081-1088. Tautz D. Genomic finger printing goes simple. Bioessays 1990; 12: 44-46. Thennan E. Human Chromosomes. Structure Behavior Effects. 2nd Ed. Springer Verlag, New York, Berlin, Heidelberg, Tokyo. 1986. van Ommen GJ, Verkerk JMH, Hofker MH, Monako AP, Kunkel LM, Ray P, Worton R, Wieringa B, Bakker E, Pearson PL. A Physical map of 4 million bp around the Duchenne muscular dystrophy gene on the human X-chromosome. Cell 1986; 47: 499 504. Welsh J, McClelland M. Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Res 1990; 18:7213 7218. Wieland I, Bohm M, Bogatz S; Isolation of DNA sequences deleted in lung cancer by genomic difference cloning. Proc Natl Acad Sci USA 1992; 89:9705 9709. Wieland I, Bolger G, Asouline G, Wigler M. A method for difference cloning: gene amplification following subtractive hybridization. Proc Natl Acad Sci USA 1990; 87: 2720-2724. Williams JGK, Kubelic AR, Livak KJ, Rafalski JA, Tingey SV. DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Res 1990; 18: 6531-6535.