J. theor. Biol. (1987) 127, 229-245
Breakage of Double-stranded DNA Due to Single-stranded Nicking RICHARD COWAN,t CHRISTINA M. COLLIS• AND GEOFFREY W. GRIGG~
t CSIRO Division of Mathematics & Statistics, Box 218, Lindfield, NSW, Australia 2070 and ~ CSIRO Division of Molecular Biology, Box 184, North Ryde, NSW, Australia 2113 (Received 10 December 1986) Enzymes such as pancreatic deoxyribonuclease (DNase I) nick the single strands of double-stranded DNA. Two nicks sufficiently close on opposite strands will lead to breakage of the DNA molecule. This paper gives a mathematical model for the breakage of circular, supercoiled DNA under the action of an enzyme which nicks at random sites (or at preferred sites, these being in abundance and randomly positioned around the circle). After the first nick the DNA loses its supercoiled structure; after many nicks it breaks to become topologically linear; further nicks lead to fragmentation of this linear form. Formulae are given for the proportions of DNA molecules in each of the four classes: supercoiled; nicked but still circular; linear; fragmented. Formulae are also presented for the case when there is, in addition to nicking, simultaneous action of an endonuclease which produces direct double-stranded breaks in the DNA. Finally, a general theory is given for the case where a third type of enzyme, topoisomerase I, is operative, with all three DNA modifications taking place simultaneously.
I.
Introduction
This paper describes a mathematical model for the breakage of a circular, supercoiled, double-stranded D N A molecule when it is exposed to an enzyme that "nicks" each of the single strands at random sites and times. Pancreatic deoxyribonuclease, also called DNase I, is such an enzyme. A nick breaks one of the phosphodiester bonds in the D N A sugar-phosphate chain and thus separates two successive bases on one strand of the D N A molecule. The first nick releases stresses and the molecule changes from a supercoiled form to a circle of D N A in a form which is no longer supercoiled. This form, often called "form I I " in contrast to the supercoiled which is called "form I", we shall refer to as relaxed. Subsequent nicks give the molecule a chance to break, for if two nicks on opposite strands are sufficiently close, the molecule breaks and loses its circular structure. It becomes, in a topological sense, linear (often called "form I I I " ) . Continued exposure to the enzyme will cause other double-stranded breaks, so eventually the linear molecule becomes further fragmented (this has not yet been called "form IV"). The times of transition from supercoiled to relaxed to linear to fragmented are variable due to the essential stochastic nature of molecular interactions and the positioning of nicks. In this paper we find the probability distributions of these transition times. Consequently, we are able to write down mathematical expressions 229 0022-5193/87/140229+ 17 $03.00/0
~ 1987 Academic Press Ltd
230
R. C O W A N
ET AL
for the proportions of D N A molecules at different times in each of the four populations (supercoiled, relaxed, linear, fragmented), assuming an initial pool of supercoiled molecules. Such formulae are necessary to interpret experimental data obtained by separating these forms electrophoretically on non-denaturing agarose gels. The first three forms gives distinct bands on such a gel. We also consider the more general situation where the D N A molecules are exposed simultaneously to two enzymes. One creates nicks on single strands (and eventually creates double-stranded breaks), the other cleaves double-stranded D N A directly. (Of course, a single enzyme may have two active regions, one nicks, the other cleaves.) A full theory of this case is given. In addition we consider a simplified theory which is appropriate when double-stranded cleavage is the dominant mechanism for breakage. This simple theory relates to the work of Povirk et aL (1977). In the general theory, consideration is also given to the situation where the initial pool of molecules is not 100% supercoiled, but instead a mixture of forms. Freifelder & Trumbo (1969), in considering problems of radiation-induced breakage of DNA, have developed an approximate formula for the transition from relaxed to linear forms. Ionising radiation produces single-strand breaks together with some direct double-stranded breaks. Freifelder & Trumbo do not consider the further transition to the fragmented form, although we shall see that a formula for the proportion of fragmented molecules is useful when interpreting the agarose gel experiments. The mathematical work of Freifelder & Trumbo concerning the relaxed-linear transition uses numerous approximations which they do not justify. Our mathematical analysis of this transition avoids these approximations, so we are able to provide an exact formula for comparsion with their approximation. The paper concludes with remarks on the effect of topoisomerase I activity. These enzymes also create a "relaxed" form from the supercoiled form. The enzyme attaches to the DNA, breaks one strand, unwinds the supercoiling and finally repairs the single-stranded break. Such molecules are indistinguishable on standard agarose gels from the nicked, relaxed form. They are often referred to as relaxed molecules, but we shall exclude them from consideration until the last section of this paper. 2. The Mathematical Model
At time zero we have a pool of supercoiled DNA molecules exposed to a pool of enzyme molecules (say DNase I). At various times an enzyme molecule nicks one of the D N A molecules and then returns to the enzyme pool. Standard ideas of stochastic chemistry state that the random collection of times at which a given DNA molecule is nicked form a Poisson process. By this we mean that the numbers of nicks on that molecule in non-overlapping time periods are statistically independent and the probability of n nicks in any time period of duration T is (AT)" e-AT~n!, where A is the average number of nicks per unit time per D N A molecule. In the model, the site at which a nick occurs is random. Specifically we assume that the site is equally likely to be on either of the two strands and that it is equally likely to take any position around the circle of DNA. In statistical terminology, the
BREAKAGE
OF
DNA
231
position is uniformly distributed around the circle. All sites of nicking are assumed statistically independent o f each other. The assumption of randomness of nicking sites does not imply that the enzyme is indiscriminant in its choice of site, though DNase I is known to nick between any two nucleotides with only mild preference for some pairs over others (Laskowski, 1971) and is often regarded as a " r a n d o m " nicking agent (Maniatis et aL, 1982). In general the enzyme may well act preferentially at sites with a particular stereochemistry, for example, sites with particular helix groove structures or phosphate backbone geometry (Drew, 1984; Drew & Travers, 1984). But these structures are generally distributed randomly (and often abundantly) around a large DNA circle, due to the mixed character of nucleotide sequences. There is, as we shall show, a mathematical equivalence between the case where nicking is indiscriminantly random and the case where nicking has a large number of preferred sites, these being randomly distributed. Suppose we think of the problem in terms of N preferred sites, these being randomly, uniformly distributed around the circle. At time t, the chance of any particular site being un-nicked is e -~'/N. This is consistent with the whole molecule being nicked at rate ~.. Thus the probability that n (out of the N ) sites have been nicked at t is ( N ) (e_e~,/N)n e~,N_~,/N ' a Binomial probability. The n nicked sites, being just a random sample from the N, will themselves be randomly, uniformly distributed. If N is large relative to At, as it certainly is in any experiment with enzymes like DNase I, this Binomial probability distribution is very accurately approximated by a Poisson distribution with mean N(1--eA'/N). Under the condition that N is much larger than At, this mean is effectively equal to At, since (1 - e ~'/N) is accurately approximated by At~ N. Thus, any mathematical difference between the indiscriminately random enzyme and the enzyme with an abundance of preferred sites is negligible. Indeed, one can extend the argument of the preceding paragraph to show that the mathematics of the problem is unchanged even if the N preferred sites differ in their affinity with the enzyme. (Here one uses the theorem [see Feller, 1965, p. 263 for a precise statement] which states that the number o f "successes" in a large number of trials has a Poisson distribution even though the chance of success differs from trial to trial.) The assumption that nick positions are statistically independent will be valid unless the appearance of a nick enhances or inhibits the chances of a nick nearby. No such enhancement or inhibition is known at present for DNase I, but some dependence of nicking sites has been shown for an enzyme such as micrococcal nuclease (Cockell et al., 1983). Let us refer to the strands by number, 1 and 2. For each nick on strand 1 we assume that there is a zone on strand 2 centred at the nick and occupying a proportion b of the circle circumference. If there is no strand-2 nick in any of these (potentially overlapping) zones, then the molecule will be relaxed and appear in the relaxed band on the agarose gel. If there is exactly one such strand-2 nick, the molecule
232
R. COWAN
ET AL
will break and appear in the linear band of the gel. More than one such strand-2 nick usually implies that the molecule is fragmented. In this case, there is no definable band on the gel; such molecules will be distributed diffusely due to their highly variable sizes. Of course, if there are no nicks on either strand the DNA molecules will appear in the supercoiled band. In the following sections we find the proportions of molecules in each of the four populations. In particular we introduce the symbols S, R, L and F for the proportions in the supercoiled, relaxed, linear and fragmented populations and provide formulae (1), (3), (5) and (8) for their calculation. A BASIC computer coding, suitable for most desktop computers, is also provided. Readers not primarily concerned with the mathematical derivation can skip to section 6.
3. The Proportion of Supercoiled Molecules The first transition, from supercoiled to relaxed, is mathematically trivial. At time t, the probability that a given molecule remains supercoiled equals the probability of no nicks during the time up to t. This is clearly e -~' where/z equals At, the average number of nicks per molecule up until time t. One can think o f / z as a "nicking dose". (Since our results involve A and t via their product only, we use p. for brevity; time is implicit in/~). Thus, if S(/z) denotes the expected proportion of molecules in supercoiled form when the molecules have had an exposure to an average of nicks (nicking dose o f / z ) , then S(p~) = e - " .
(1)
The remaining transitions, from relaxed to linear to fragmented, involve more elaborate mathematics.
4. The Proportion of Relaxed Molecules In this section we derive the probability R, that the molecule remains relaxed after a total of n nicks. Then, knowing the probability distribution of the number o f nicks when the nicking dose is /z, we find the probability that the molecule remains in the relaxed form after dose/~. The results are quite simple to derive, but depend upon a fairly substantial theorem on "coverage probabilities" proved by Siegel (1978). Given a total of n nicks distrbuted around both strands, there is a probability
J/ of having j nicks in strand 1, j = 0 , 1. . . . , n. This would then create j potentially overlapping taboo zones o f length b on strand 2. The n - j nicks on strand 2 must avoid these taboo zones if the molecule is to remain relaxed. Let V be the proportion of the circle uncovered by these j taboo zones (V for vacant). Note that V is a random variable. Given n, j and V, the probability that the n - j nicks on strand 2
BREAKAGE
233
OF DNA
all lie within this vacant proportion is V "-j. Thus, conditional upon n and j but no longer upon V, the probability that none o f the n - j strand-2 nicks lie within the taboo zones of the circle is the expectation of V "-j given n and j. This quantity has been found by Siegel (1978, Theorem 2). It is 1 when j = 0 or n, whilst for 1 ~
k=l
where (1 - k b ) + denotes the larger of 1 - kb and zero. Thus the summation, written as k = 1 to k=j, will contain zero terms if j > 1/b. There will also be zero terms if j > n - j , because the first combinatorial coefficient within the summation becomes zero.
Thus, conditional only upon n, the probability of no strand-2 nicks lying in the taboo zones created by strand-1 nicks is n
= 2 -°
2+E En i=l k=l
=2-"
2+ E
n(1-kb)+-'/k Y~ j=k
k=l
I-
--2'-"
k- 1
[,/2]
~ k=O 2k
n-1
n-1
k
1 k
]
(1 - kb)~.-'.
(2)
The last two summations run to k = [n/2], defined as the integer part of n/2, but they will include zero terms if k exceeds 1/b. Note that the second last expression is derived using standard combinatorial methods (Feller, 1965, p. 61). Computations with (2) give an indication of the number of nicks needed to break the molecule. For example, with b = 0-005, 89% of molecules with l0 nicks will still be relaxed, as will 62% of those with 20, 33% with 30, 14% with 40 and 5% with 50. After nicking dose ~ there is a random Poisson-distributed number of nicks on the molecule. So the probability, R ( ~ ) say, that the molecule is in the relaxed form after d o s e / ~ is R(/z)= ~
e-"t.t'~R,,/n!.
n=l
Substitution of (2) and interchange in the order of summation between indices n and k yields R(/z) = 2 e - ~ / 2 - 2 e-~'+p.X, X = ~ k=l
e - ~ ( l + k b ) / 2 [ / z ( 1 --
where
kb)+/2]2k-I/(2k)!.
(3) (4)
234
R. C O W A N
ET AL
All terms with k > l / b in formula (4) are zero. In any case the summation effectively converges after relatively few terms, due to the rapdily increasing denominator (2k)!. The formula is easy to compute; the following lines of BASIC code suffice. 10 I N P U T MU, B 160 K = I 20 GOSUB 100 170 K B = K * B 30 P R I N T R 180 IF K B > 1 T H E N 270 40 E N D 190 W 2 = W 1 200 Wl = ( 2 * K - 1)*LOG(M2*(1 - KB)) - M 2 * ( 1 + KB) - FAC 210 IF Wl < - 2 0 A N D W1 < W 2 T H E N 270 220 E W = E X P (W1) 100 R E M - C A L C R 230 X = X + E W 110 S = E X P ( - M U ) 240 K = K + I 120 FAC = LOG(2) 250 FAC = F A C + L O G ( 2 * K * ( 2 * K - 1)) 130 M 2 = M U / 2 260 GO TO 170 140 X = 0 270 R = M U * X + 2 * ( E X P ( - M 2 ) - S ) 150 W 1 = - 9 9 9 9 9 9 280 R E T U R N 5. The Proportion of Linear Molecules When a strand-2 nick first occurs in one of the taboo zones the molecule will break. On most occasions this nick will be within the critical breaking distance, b/2, from only one strand-1 nick. Occasionally however, the strand-2 nick will occur within the critical distance of two or more strand-1 nicks. In this situation, one would expect that the strands would separate between the strand-2 nick and each strand-1 nick. The molecule would break but lose a tiny piece of single-stranded material in the process. The lost piece would be less than b nucleotides long and, since b is known to be small, the broken molecule would still appear in the linear band of the gel, despite its slight degradation. Because it is indistinguishable on the gel from linear molecules, we shall still classify this mildly fragmented molecule as
linear. The more usual type o f D N A degradation is qualitatively different from this. Usually there is a second break in the molecule, creating two random-sized fragments both significantly smaller than the full molecule. These pieces do not appear in the linear band o f the gel. Nor do they form any noticeable band on the gel, because of their spread of sizes. We refer to molecules incurring this type of breakage as
fragmented. Using these definitions of "linear" and "fragmented", we are not able to derive an exact mathematical expression for the proportion, L(/x) say, of linear molecules after nicking dose/x. At first glance it would seem quite simple. Given n nicks in all, with j on strand 1 creating a vacant zone occupying a proportion V of the molecule, a linear molecule is formed if exactly one of the n - j strand-2 nicks lies within the non-vacant zone. The conditional probability of this is ( n - j ) V " - J - ' ( 1 V). To this must be added the chances that more than one of the n - j strand-2
BREAKAGE
OF DNA
235
nicks lies in the non-vacant zone, but in an arrangement which creates only the " m i n o r " loss mentioned above. The mathematical complexities of these additional cases are considerable. Rather than ignore them on the grounds that they occur rarely, we can find upper and lower bounds for their probability. We can show (details available on request) that, given n, j and V, the probability of i strand-2 nicks in the non-vacant zone in an arrangement which creates only minor loss is greater than
(n~j)
vn_J_i(l_ V)bi_l"
This lower-bound is for i ~> 1, n~>2 and l < ~ j ~< n - l ; it is zero otherwise. One can also prove that the probability is less than i! times this lower b o u n d (though much closer to the lower-bound than to this upper-bound). In a manner similar to the derivation of (3), we are able to obtain lower and upper bounds for L(p~), the probability that the molecule is linear after dose /~ (details available). We have that L(/z) > b-~(e~b/2-1)(tzX - Y+e-~'/2-e-~'),
(5)
L(/x) < / z ( 2 - btz)-~(l~X - Y + e -~'/2 - e -~')
(6)
Y = ~ e-"C1+kb~/2[lx(1-kb)÷/2]2k-~[2k + lz(1-kb)/2]/(2k)L
(7)
where
k=l
TABLE 1
Values of L(IX), the proportion of linear molecules after nicking dose ix (that is, after time t at nicking rate h, where ix = ht). A comparison of the lower-bound formula (5) with the upper-bound formula (6). The left column, ix, is the average number of nicks per molecule, b equals 0.005 Jz
Lower b o u n d
Upper bound
1 2 5 10 20 30 40 50 60 70
0-00125 0-00495 0-02993 0-10800 0-29625 0-36858 0-29632 0"17344 0"07839 0"02836
0"00125 0"00496 0"03012 0"10939 0"30407 0"38356 0'31276 0'18573 0'08521 0'03130
236
R. C O W A N
ET AL
The bounds (5) and (6) are in close agreement when b is small. Table 1 shows this for an example where b = 0-005. This lies within the range o f realistic b values. So, although we do not have an exact formula for L(/x), we have usefully close lower and upper bounds. We recommend the use o f (5) for this is very close to being an exact formula for L(~). To calculate L(/~) using (5) we can change the BASIC lines 30 and 100 to "30 P R I N T R, L" and "100 REM-CALC R & L" and add the following lines. 145 235 273 275 277
Y=0 Y= Y+EW*(2*K+M2*(1-KB)) MB=M2*B COEFF=M2*(I+MB*(0-5+MB/6)) L = C O E F F * ( M U * X - Y + E X P ( - M 2 ) - S)
One can use "275 C O E F F = M 2 . ( 1 + MB*(1 + MB))" to calculate L(/~) with the upper-bound formula (6), though this will not be needed if b is small. 6. The Four Molecular Populations
Formulae (1), (3) and (5) give the proportions S(/z), R(/z) and L(/z) of supercoiled, relaxed and linear molecules after nicking dose ~, that is after the average number o f nicks per molecule has reached /z. If the remaining population, the fragmented molecules, have proportion denoted by F(tz), then F(/z) = 1 - S(/z) - R (/.L) - L(/z).
(8)
For the computation of all four quantities adjust the BASIC code given above in sections 4 and 5 by adding "278 F = 1 - S - R - L " and changing to "30 P R I N T S,R,L,F" and "100 REM-CALC S,R,L,F".
(a)
(b) 1-0
/
1,0~mented
o o
g o
0.5
o a-
0
10
100
0
0
10
100
FIG. 1. The molecular proportions for (a) all four forms, and (b) the three non-fragmented forms relative to their sum. The proportions are plotted against/~, the average number o f nicks per molecule. The scale for p. is linear in In (1 +p.). The parameter b is set at 0.005. The initial mix is 100% supercoiled. There is only a nicking action.
BREAKAGE
OF
DNA
237
We now illustrate graphically the nature of these formulae. For illustration we choose b = 0-005, corresponding to a typical phage or plasmid of 4000 nucleotide pairs, with a hypothetical taboo zone of 20 nucleotides. Figure l(a) shows a plot of S, R, L and F as functions of IX using formulae (1), (3), (5) and (8) and the BASIC program. It shows the change in the proportions of all four molecular populations as the nicking dose increases, starting from 100% supercoiled and ending with 100% fragmented. Since the fragmented population is not observable as a defined band on agarose gels, we show S, R and L as proportions of S + R + L in Fig. l(b). For all values of b, the curves are similar in character: an exponentially declining supercoiled population, reaching negligible proportions when the relaxed population reaches its maximum and when the linear forms first appear.
7. Single-stranded Nicking Combined With Double-stranded Cleaving In this section we modify the foregoing theory to cover the situation where there is simultaneous exposure to a nicking agent and a cleaving agent acting independently. The cleaving agent may be a separate endounclease which, when acting on the D N A molecule, produces a double-stranded break directly. (We refer to these breaks as "cuts" and so use "cleaving" and "'cutting" synonomously. It is immaterial whether a cut is staggered or straight). We assume that the nicking dose is Ix (as before) and the "cutting dose" is 4,. Both IX and 4, have an implicit proportionality with time. The numbers of nicks and cuts on any molecule will each have a Poisson distribution. All molecules start as supercoiled. We define S(IX, 4,), R(IX, 4,), L(/z, 4,) and F(IX, 4,) as the proportions of the four molecular forms. Molecules which are un-nicked will be either supercoiled, linear of fragmented depending upon whether they have zero, one or more than one cut. Molecules that have one nick will appear as either relaxed, linear or fragmented as the number of cuts is zero, one or more than one. Those with more than one nick will be linear if there are no cuts, but will otherwise be fragmented. The probability of no cuts is e -~, of one ~be-~. Thus
S(ix, 4,)= e-*S(ix)
(9)
R(IX, 4,)= e-rR(IX)
(10)
L(IX, 4,) = e-'t'[L(ix) + 6{S(IX) + R(IX)}]
(11)
F(p., 4,)= 1 -S(IX, 4 , ) - R(IX, 4 , ) - L(IX, 6).
(12)
Thus it is a simple matter to compute the four population proportions. Figure 2(a) shows typical curves for the cases where the cutting rate is 0.2 and 0.5 times the nicking rate, b being 0.005 as before. Figure 2(b) shows these cases plotted as proportions of the non-fragmented molecules. Note that the addition of cleaving produces marked changes in the size of the relaxed peak and in the timing of the onset o f linear and fragmented forms. Obviously, for small values o f / z (at early
238
R. C O W A N
(o) 1-0~
I$
/1-
(~I-o~.,,
Fragmented
o
f Supercoiled /
~
FY,
/
ET AL
/I
/
f~
Z
; II f ~ 1
I0
Supercoiled //.,,,.'Linear ,,
. _
0
......
t00
0
P-
S/
~,~ jRelaxed
~
10
100
F
FIG. 2. The molecular proportions for (a) all four forms, and (b) the three non-fragmented forms relative to their sum. The proportions are plotted against/z, the average number of nicks per molecule. The scale for/x is linear in In ( 1+ p.). The parameter b is set at 0.005. The initial mix is 100% supercoiled. There is both nicking and cutting. In both (a) and (b) the solid curves refer to the case where q~=0-5/~. Here, ~ is the average number of cuts per molecule. The dashed curves show the case ~b= 0.2/~. times), there is a m u c h higher p r o p o r t i o n o f linear and f r a g m e n t e d molecules than in Figs l(a) and l(b), t o g e t h e r with lower p r o p o r t i o n s o f relaxed a n d supercoiled. This is the effect o f the cutting action. In section 5 we discussed the possibility o f m i n o r f r a g m e n t a r y loss for molecules that a p p e a r in the linear band. Small pieces o f single-stranded D N A o f length less than b could be lost by one nick being in the t a b o o zone o f two or m o r e nicks on the opposite strand. Similarly there can be m i n o r f r a g m e n t a r y loss due to the c o m b i n e d action o f nicking a n d cutting. W h e n a relaxed (and therefore uncut) molecule incurs its first cut, we have said that it b e c o m e s linear. I f a nick lies within b/2 o f this cut, however, a tiny fragment is lost. Since b is small, these molecules still a p p e a r in the linear b a n d and are sensibly n a m e d "linear".
8. A Simplified Theory When Cleaving is the Dominant Mechanism There are circumstances when the cleaving m e c h a n i s m d o m i n a t e s the nicking m e c h a n i s m in the f o r m a t i o n o f linear a n d f r a g m e n t e d molecules. Whilst m a n y nicks on average are n e e d e d to create a linear or f r a g m e n t e d molecule, just one cut (for linear) and two cuts (for fragmented) will suffice. Unless ~b << tz the only noticeable effect o f nicking is in the creation o f the relaxed form from the supercoiled. It is easy to show that, if d o u b l e - s t r a n d e d breakage due to nicking is ignored, then S ( ~ , 4') = e -~'+*~
(13)
R(p., ~b) = ( 1 - e - ~ ' ) e -~
(14)
L(/z, ~b) = ~b e - *
(15)
F(t~, 4,)= 1 - e - * ( 1 + 4,).
(16)
BREAKAGE
OF
DNA
239
Notably, these new formulae do not depend upon b, the size of the taboo zone. This is to be expected since the simplified theory completely ignores the possibility of breakage due to nicking. If (13)-(15) are divided by 1 - F(/z, 4'), we get the relative proportions of supercoiled, relaxed and linear amongst the non-fragmented forms. Writing these as S', R', and L' we have S ' = e-~'/(1 + 4') (17) R' = (1 -e-~')/(1 + 4')
(18)
L ' = 4'/(1 + 4').
(19)
If the relative proportions are determined from gel electrophoresis data, then ~ and 4' can be calculated. 4' = L ' / ( 1 - L')
(20)
/z = - ln[S'(1 + 4')].
(21)
Povirk et al. (1977) have utilised this simplified mathematics to estimate singlestranded and direct double-stranded breakage rates for Col E1 DNA treated with bleomycin. They did not, however, correct for the uncounted fragmented molecules and calculated/z and 4' from (13) and (15) rather than from the correct formulae (20) and (21). This omission did not seriously distort their estimates of/~ and 4' since F ( ~ , 4') was near zero for their p. and 4'. For Col El, they found that/.L was approximately 9 times 4' at various bleomycin doses. Equations (13)-(16) are an oversimplification when cleaving is not the dominant breakage mechanism. The more technical mathematics leading to (9)-(12) is essential in this case. Figure 3 illustrates this by plotting the relaxed and linear molecules as a proportion of the unfragmented molecules for both the exact and simplified theory. Various ratios of tz to 4' are shown, including the 9 to one case of Povirk et aL We do not plot the comparison of supercoiled proportions [that is, of S~ (1 - F)] because the full and simplified theories give virtually equal results. This equality happens because the formulae for S [namely (9) and (13)] are identical and the calculations for 1 - F [based on (12) and (16)] do not show any discrepancy for small lz, the only dose levels where S is not effectively zero. Figures 3(a) and 3(b) show, as anticipated, that the simplified theory becomes grossly inadequate as true double-stranded cutting becomes less dominant relative to nicking. The failure to model the creation of linear forms via the nicking mechanism guarantees this. So, it is in the declining part of the relaxed curve (with the effect that this decline has on the linear curve) where discrepancies are seen. The figures clearly show that the theories are coincident for small tz, so our simplified theory applies to this region regardless of the/~ to 4' ratio. We use this fact in the next section.
9. A Mixed Starting Population Returning to the general theory, we now investigate the situation where the initial pool of DNA molecules contains a mixture of forms. We concentrate firstly on the
240
R. C O W A N E T A L
o.sF
//
0
1
\%,~
\
10
",,
I
100
,~ o.s
0
1
10
100
FIG. 3. Proportions of (a) relaxed, and (b) linear forms amonst the non-fragmented molecules. Both (a) and (b) show a comparison between the exact theory [based upon eqns (9)-(12) and shown by solid curves] and the simplified theory [based upon eqns (13)-(16) and shown by the dashed curves]. Nicking and cutting take place simultaneously. A range of ratios tz to q~ is shown. The proportions are plotted against/z the average number of nicks per molecule. The scale for/z, is linear in In (1 +/z). The initial mix is 100% supercoiled. The parameter b is set at 0-005.
case where the mixture arose solely because the DNA had already undergone some nicking and cutting. (Here we interpret "cutting" liberally to include any random double-stranded breakage.) In section 11 we remark on the case where the relaxed molecules in the mixture may exist because of earlier topoisomerase activity. We envisage that a commercial preparation of D N A is to be used for a "nicking and cutting experiment" and that this preparation has had prior exposure to doses /Xo of nicking and 4~o o f cutting. The values of/~0 and ~b0 may not be known to the experimenter, nor of much interest per se, but failure to take account of the prior exposures will create difficulties in the interpretation o f any subsequent nicking or cutting. O f course, purification o f the D N A to 100% supercoiled is an option for the experimenter, but the mathematical theory can easily be modified to allow for any starting mix. The time-consuming purification step is not really needed. The amount o f prior exposure can be estimated by measuring the relative proportions o f supercoiled, relaxed and linear in the initial mix (through running an agarose gel). For example, one may find these forms in relative proportions 75%, 15% and 10%. Since the supercoiled form is dominant, this indicates a low prior exposure and the applicability o f (20) and (21) to estimate/Zo and tko. ~bo= 0 . 1 / ( 1 - 0 . 1 ) = 1/9 /Zo = - I n [7.5/9] = 0.182. Thus, in this example one estimates that the D N A has had prior exposure to nicking at an average of 0.182 nicks per molecule and to cleaving at an average of 0.111 cuts per molecule. Formula (16) shows that a negligible faction (0.57%) of the D N A is fragmented due to the prior exposure. Note that a knowledge o f b is not needed to reach these conclusions if/~ is small.
BREAKAGE
OF
241
D N A
After prior exposure, the mixture o f D N A forms is treated for a time t with nicking a n d / o r cleaving enzymes. Thus, if A and p are the nicking and cutting rates per molecule per unit time, then the total doses at time t are /z =/zo + At
(22)
cb = 4)o + pt.
(23)
The general theory remains intact, with eqns (9)-(12) describing the proportions o f molecules o f the four forms, but n o w (22) and (23) are used to link dose with time. The simplified theory (13)-(21) also remains intact, but with (22) a n d (23) used. Figure 4 illustrates h o w the molecular proportions change in this new situation. It shows an example where b = 0.005,/% = 0.2, 4)0 = 0-1 and p = 0. That is, the D N A has u n d e r g o n e some nicking and cleaving prior to the addition o f just one e n z y m e (with a nicking action only). The time profiles o f the reaction are qualitatively different from those in Fig. 1. (o)
1-0
(b)
t
1-0
o fx
.9 L_
o
r~
g 0-5 o
{1.
0.5
_s
o
( 0 0-2
1
10
100
)~ I 0.2
10
100
FIG. 4. The molecular proportions for (a) all four forms, and (b) the three non-fragmented forms relative to their sum. Here, the initial mix is not 100% supercoiled. The DNA molecules have incurred an average of 0.2 nicks and 0.1 cuts prior to the addition of an enzyme which nicks but does not cut. This means that the initial mix is 74-1% S, 16-4% R, 9.0% L and 0.5% F. The proportions are plotted against ~, the average number of nicks per molecule. The scale for/~ is linear in In (1 + p-). The parameter b is set at 0.005.
We see, in the general character o f Fig. 4(b), the typical curves o f m a n y e n z y m e a n d drug studies, for example, C a m p b e l l & J a c k s o n ' s (1980) curves for D N a s e I nicking o f SV40 in the presence o f magnesium, calcium or zinc ions. (Their curves are plotted as functions o f time.) N o t e that the early part o f the linear curve in Fig. 4(b) is horizontal. If the cutting rate p is not zero, this curve is not horizontal for s m a l l / z ; it has positive slope. This aspect is also evident in m a n y studies: C a m p b e l l & J a c k s o n ' s (1980) study using D N a s e I with m a n g a n e s e or cobalt ions and various studies with b l e o m y c i n analogues (Lloyd et al., 1978; H u a n g et aL, 1981, 1983; Mirabelli et al., 1980).
242
R. C O W A N
ET AL
It is easily seen that the theory leading to eqns (9)-(12) does not depend critically upon the relationship between tz and time or on that between ~ and time. Nicking and cutting may proceed at rates which vary with time in any fashion. These equations are therefore appropriate in situations where (in enzyme kinetic terminology) there is enzyme depletion, substrate depletion or competitive product inhibition. (Here the substrate is the collection of un-nicked sites and the product, the collection of nicked sites.) These points will be the subject of later work. 10. Freifelder & Trumbo's Formula
Freifelder & Trumbo (1969) have developed an approximate formula for the proportion of relaxed molecules in the situation where there is nicking alone. In our notation, it is R(p.) -- (1 - bl~/2) '~/2- e-C
(24)
In their derivation they have not taken into account many aspects of statistical variation. They assume that each DNA molecule incurs the average number of nicks (rather than a Poisson distributed number). They assume that these are equally shared by the two strands (rather than distributed Binomially between them). For simplicitly they assume that the nicks on strand 1 create non-overlapping taboo zones (ignoring the overlap and random nature o f the vacant zone on strand 2). These assumptions, combined with some other minor inaccuracies, lead them to eqn (24). The impact of these simplifying assumptions is not clear from Freifelder & Trumbo's work. Our mathematically exact result, formula (3), provides a method for checking the accuracy of their approximate expression, (24). Figure 5 shows that their expression is quite good for small b. We might expect this because the two equations, (3) and (24), are identical at b = 0 (they both equal 1 - e - ~ ' ) . Whilst large percentage errors occur for higher/x values (Fig. 5(b)), these apply mainly when the relaxed population has been significantly reduced in number. For b = 0.001 or smaller, their approximation is very good. But at b = 0-02 or higher, it seems safer to use the exact formula. Values o f b as high as 0.05 are currently realistic, as some small phages and plasmids have only 2000-4000 base pairs, whilst taboo zones as high as 109 bases (at elevated temperatures) are mentioned by Freifelder & Trumbo. The cloning vector PiAN7 has only 885 base pairs. So if this artificially constructed D N A molecule becomes widely used, b values of around 0.1 may be encountered. Figures 5(a) and 5(b) contain this extreme case. The exact theory is obviously needed in this situation. Incidentally, Fig. (5a) shows the impact o f b on the exact curves o f R(~, ~b). The early part of the curves do not depend upon b. This is to be expected since the rapid rise in number o f relaxed molecules takes place as the D N A gets its first nick (and d e a f l y b is uninvolved). On average, many more nicks are needed to create a break so there is a time lag before relaxed molecules start to be lost in significant numbers. By this time the supercoiled population is near extinction, with relaxed numbers ceasing to rise. It is the loss of relaxed molecules that involves b. Figure
BREAKAGE
(a)
OF
(b)
1-0
o
o.
2
g
\-
Q
(
0-5
243
DNA 10
o%°, o
.
0
.~ o~
',,,\ \\
0-
\',,,\\ \ ,
0
I
\
1o
x\
\x,.
0
IOO
t
o
I
1o
10o
F.
FIG. 5, Comparison, for a range of b, of the exact formula for the proportion of relaxed molecules with the approximate formula due to Freifelder & Trumbo (1969). In (a), the solid curves are from the exact eqn (3), whilst the dashed curves are from (24), Freifelder & Trumbo's approximation. In (b), the ratio of their formula to the exact is shown. The curves are plotted against /~, the average number of nicks per molecule. The scale for/x is linear in In (1 +~). The initial mix is 100% supercoiled. There is only a nicking action. 5(a) shows that the m a j o r differences between curves for different b values occur during this fall in numbers.
11. Interaction with Topoisomerases The relaxed b a n d o f a gel may contain molecules created from supercoiled forms by the action o f some t o p o i s o m e r a s e I, an e n z y m e that unwinds the supercoiling but leaves the D N A free o f nicks. The two types o f relaxed form can be separated by treatment with ethidium b r o m i d e or chloroquine when running the agarose gel. So it is possible to determine if t o p o i s o m e r a s e activity is present during an experiment, or if relaxed molecules in an initial mix have been caused by topoisomerase. On the other hand, it is possible to a c c o u n t for topoisomerase activity mathematically. Equations (9) and (10) are modified only slightly whilst (11) and (12) remain u n c h a n g e d . Equations (9) a n d (10) b e c o m e
S(tz, d~, O) = e-°S(tx, qb)
(25)
R(/x, &, 0 ) = n(/~, @)+ (1 - e - ° ) S ( l ~ , 4)
(26)
where the new S and R are expressed as functions o f three parameters. The new p a r a m e t e r 0 describes the rate o f t o p o i s o m e r a s e - D N A interaction a n d is analogous to tz to ~b, except that the interaction has an effect only if it is the first o f all three types o f interaction. The action o f all three interactions is illustrated in Fig. 6(a). F o r both the solid and dashed curves, ~b is set at 0.2 tx with b = 0 . 0 0 5 and S initially at 100%. For the d a s h e d curve, 0 = 0.3/z whilst 0 = 0 for the solid curve. It is clear that the addition o f simultaneous t o p o i s o m e r a s e activity hastens the rise in R a n d the decline in S. The fall in R is delayed (and so R reaches a higher peak), but the decline in the
244
R. C O W A N
ET
AL
(b) 1.0
(0) a.o
.Relaxed ,/"~rogment ~
Supercoiled /~Frogmented
~
d~ e /
o
0
1
10
100
F
O'
1
10
1OO
0,2
FiG. 6. Proportions of molecules plotted against/~. In (a), the initial mix is ]00% supercoiled. The solid curves show the case where ~ =0.2/~ and 0 = 0, whilst the dashed curves show the case where = 0.2/.¢ and 8 = 0.3/L, The parameter 8 indicates the level of topoisomerase I activity on the supercoiled population. In (b), we show the comparison when the initial mix is 74.]% S, ]6-4% R, %0% L and 0.5% F as in Fig. 4. 3"11¢solid curves show the case where all relaxed molecules in the initial mix are the result of nicking. The dashed curves assume that all relaxed molecules in this mix resulted from topoisomerase activity. The mix is subsequently exposed to an enzyme that nicks but does not cut. 3"he parameter b is set at 0.005 in both (a) and (b).
two curves is the same. The topoisomerase plays no role once the S population is depleted. The solid and dashed curves for L and F coincide, so only the solid one is shown. The simplified theory is also modified in accordance with (25) and (26). Thus, eqns (13) and (14) become S(~, 4~, 0) = e -~+'~+°~ R(/z, ~b,/9) = ( 1 - e -~') e-6 + (1 - e -°) e -C~'÷'~ = (1 - e -°'+°~) e - * .
Thus the expressions for S and R are the same as before, except that/x + 0 replaces /x. This is to be expected because the simplified model ignores all but the first nick; it makes nicking activity and topoisomerase activity equivalent. Only their total rate is important. Figure 6(b) illustrates the case (already shown in Fig. 4(a)) where the initial mix is not 100% supercoiled. The solid curve shows the temporal development if it is assumed that all relaxed molecules in the initial mix were the result of nicking. The dashed curve assumes that they were all the result o f prior topoisomerase activity. Once again, the effect is in the rising part o f R and in the corresponding fall of S. Some difference is to be expected, since, under the former assumption, the relaxed molecules have a more advanced status of nicking. We have had helpful discussions with Peter Molloy, Ruth Hall and Horace Drew.
BREAKAGE OF DNA
245
REFERENCES CAMPBELL, V. W. 8~. JACKSON, D. A. (1980). 3'. Biol. Chem. 255, 3726. COCKELL, M., RHODES, D. & KLUG, A. (1983). J. tool. Biol. 170, 423. DREW, H. R. (1984). J. tool. Biol. 176, 535. DREW, H. R. & TRAVERS, A, A. (1984). Cell 37, 491. FELLER, W. (1965). An Introduction to Probability Theory and its Applications, New York: Wiley. FREIFELDER, D. & TRUMBO, B. (1969). Biopolymers 7, 681. HUANG, C. H., MIRABELLI,C. K., JAN, Y. & CROOKE, S. T. (1981). Biochemistry 20, 233. HUANG, C. H., MIRABELLI, C. K., MONG, S. & CROOKE, S. T. (1983). Cancer Res. 43, 2849. LASKOWSKI, M. (1971). In: The Enzymes (Boyer, P. D.ed.) London: Academic Press. MANIATIS, T., FR1TSCH, E. F. & SAMBROOK, J. (1982). Molecular Cloning: A laboratory manual. New York: Cold Spring Harbour. MIRABELLI, C. K., HUANG, C. H. & CROOKE, S, T. (1980). Cancer Res. 40, 4173. POVIRK~, L. F., WOBKER, W., Kt~HNLEIN, W. & HUTCHINSON, F, (1977). Nucleic Acids Res. 4, 3573. StEGEL, A. F. (1978). J. Appl. Prob. 15, 774.