Ring theory II. Fractional tandem model

Ring theory II. Fractional tandem model

I. theor. Biol. 61, 329-351 Ring Theory II. Fractional Tandem Model BARRY MARTIN DANCIS Department of Biology, Temple University, Philadelphia, Penns...

1MB Sizes 3 Downloads 36 Views

I. theor. Biol. 61, 329-351

Ring Theory II. Fractional Tandem Model BARRY MARTIN DANCIS Department of Biology, Temple University, Philadelphia, Pennsylvania 19122, U.S.A. (Received 12 February

1975, and in revisedform

26 November

1975)

DNA fragments partially digested by a 3’- or Y-specific nuclease to produce single chain ends of opposite polarity will form a ring if the ends contain complementary sequences and are allowed to anneal. The frequency of rings can then be used as an assay to determine where and how identical repetitious sequences are arranged in the DNA. Thomas et al. (19736) showed that all eucaryote chromosomes studied contain similar if not identical repetitious sequences clustered into regions called g-regions. To account for the observed ring frequency under different experimental conditions Thomas, Zimm & Dancis (1973c) derived equations for two possible models of g-region organization. In the pure tandem model, the repetitious sequences are contiguous and occupy the entire g-region. In the intermittent repetition model, the repetitious sequences are simple copolymers and are irregularly arranged among non-repetitious sequences which are heterogenous in length. In the present paper, the results of Thomas et al. (1973c) are extended to cover the fractional tandem model. In this model, adjacent repetitious sequences are separated by non-repetitious sequences of uniform length. In addition, the equations for both the pure tandem and intermittent repetition models are shown to be special cases of the fractional tandem model but not vice versa. The capabilities and limitations of an analysis of ring formation are demonstrated using data from Drosophila. Although it is not possible to rule out any of the three models, the analysis can limit the ranges of the parameters describing each of the models that are consistent with the data. Previous conclusions that the data could only be explained by a pure tandem model which lacks any intervening unique sequences (Bick, Huang & Thomas, 1973; Thomas et al., 19736), are shown to be incorrect, in part because the equations for the fractional tandem model had not then been derived. Thus ring theory equations can be used to show the presence of clusters of similar if not identical sequences from ringforming experiments, but they may not be able to determine the exact spacing and arrangement of these sequences within the clusters. 1.11. 329 22

330

B.

M.

DANCIS

1. Introduction The ability of a DNA double helix with single strand ends to form a ring when annealed has shown the presence of densely clustered repetitious sequences in eucaryote DNA (a summary of the experiments are given in Thomas et al., 1973b). For each of two models of DNA organization, Thomas, Zimm & Dancis (1973c) derived equations to relate the frequency of rings to the frequency of repetitious sequences. In both models, the repetitious sequences were imagined to be localized within certain regions of finite length called g-regions, and every g-region contained multiple copies of a single sequence (Fig. 1). In the intermittent repetition model, the copies

Pure Fractional

lntermittont

tandem tandem

Double

helix

repetition

FIG. 1. The regionally-repetitious chromatid (adapted from Fig. 2 of Thomas et al., 1973~). The single line with folds represents a single chromatid organized into distinct regions called g-regions. The length of a given g-region is g nucleotide pairs and the fraction of all nucleotides in g-regions is y. The g-regions are characterized by containing copies of a single repetitious sequence. Two models are pictured for the organization of the repetitious sequences within the g-regions. In the tandem model, the repetitious sequences are located at regular intervals within the g-xegion. The distance from the beginning of one repetitious sequence to the beginning of the next is s nucleotide pairs and is the length of one repetitious unit. At one end of each repetitious unit is a repetitious sequence g’ nucleotides long. Fractional tandems have nonrepetitious sequences ~3’ nucleotides long separating the repetitious ones. The pure tandem is a special case where s-g’ = 0. In the second or intermittent repetition model, the copies of the repetitious sequence are irregularly arranged within the g-region.

were distributed at random among non-repetitious or unique sequences. In the pure tandem model (previously called the tandem repetition model) the multiple copies were contiguous and there were no intervening unique sequences. The present paper will extend the results of Thomas et al. (1973c) to cover the fractional tandem model. In this model, the multiple copies of a repetitious sequence are distributed in a regular fashion so that within a g-region, adjacent copies are separated by blocks of unique sequences of the

RING

THEORY

331

same length. This paper will also show that the fractional tandem model can be used to describe both the intermittent repetition and pure tandem models by letting the lengths of the unique sequences from different g-regions be randomly distributed for the former and be equal to zero for the latter. Finally, the efficacy of measuring the ring-forming ability of DNA to determine its sequence organization will be determined using T7 and Drosophila DNA as examples. 2. Definitions The number of nucleotides resected from one end of a DNA fragment. Usually taken as the average value determined by the fraction of acid soluble nucleotides. The minimum number of nucleotide pairs required to form a stable helix under the conditions of annealing. The number of nucleotide pairs in a g-region. A g-region is a length of DNA containing multiple copies of a single repetitious sequence both within and at each end of the g-region. The length of a single copy of the repetitious sequence in the g-region. The length of the largest block of nucleotides in a repetitious sequence not composed of repeats of smaller blocks. The number of non-internally repeating blocks of nucleotides in a repetitious sequence so that q = g’/u. If the repetitious sequence is a homopolymer u = 1, and q = g’. If the repetitious sequence is internally unique, then u = g’, and q = 1. The probability that the parts of the repetitious sequences in the two resected ends of a fragment are not complementary. The length in nucleotides of the repetitious unit within a g-region. It is the distance from the beginning of one repetitious sequence to the beginning of the next. In a tandem model, the value of s is the same throughout the g-region. In a pure tandem model s = g’ while in a fractional tandem model s > g’. In an intermittent repetition model, the value of s is heterogeneous within a g-region. The number average of S. The fraction of a repetitious unit that is repetitious. It is equal to g’/s. The duplex fragment length before resection measured in nucleotide pairs. The “resected length” I’ = I-2r+ 2b, - 1. Since b. is much smaller than I, 1’ is the duplex fragment length after resection and can be determined by measurements of aqueous electron micrographs of the DNA.

332

B.

M.

DANCIS

The remainder after the fragment length is divided by the length of the repetitious unit so that I = ms+ k where m is a positive integer or zero and (be - 1) < k < (s + b,). The length of the overlap. It is the number of nucleotides in the a resected ends of a fragment that come from homologous parts of the repetitious unit: a = 2r - k. The fraction of cyclizable fragments with a particular value of k from a g-region of infinite length. Fragments of different length but the same value of k will cyclize with the same frequency. The fraction of cyclizable fragments from a g-region of infinite length where the fragments with the s different possible values of k are present in equal frequency. The number average of g. The number of g-regions in the haploid genome. The number of nucleotide pairs in the haploid genome. The fraction of all nucleotide pairs residing in g-regions. y = @j/A. g*, II*, y* Properties of all g-regions longer than I-2r +u+b, - 1. Shorter g-regions will not be able to produce cyclizable fragments. E The efficiency of ring formation. It is the number of rings formed divided by the number of fragments that could have formed rings if resection, annealing and electron microscopy conditions were appropriate. R Ring frequency. The observed number fraction of all fragments that are rings. &CT,, n,, ~1, s, Properties of g-regions satisfying the case 1 conditions: k

g>1+2g’-s-1

and

s>r+g’-i--$-l.

When

s
RING

AT, B

333

THEORY

The difference in TL between rings formed by pure and fractional tandem repeats. A constant equal to the product of the minimum number of base pairs required for stable helix formation at a given temperature and the difference between the given temperature and the melting temperature of DNA of infinite length: B = b,ATm. 3. Methods and Materials

A model 9lOOB Hewlett Packard Calculator with an attached plotter (H.P. model 9125A) was programmed to give numerical and graphical solutions to the equations. 4. Derivation of JEquations for the Fractional Tandem Model (A)

g-REGIONS

OF INFINITE

LENGTH

In this section, the equations for the fractional tandem model will be derived for the less complicated case where the genome is composed of infinitely long g-regions. The equations will then be modified in the next section to describe the more complicated case of a genome with finite g-regions. Within a g-region, the length of each copy of the repetitious sequence is g’. The length of one copy plus one unique sequence, which together form one repetitious unit, is S. Initially we will assume that the repetitious sequence is not internally repetitious. When a g-region is twisted into a superhelix with all the copies of the repetitious sequence overlapping, it will appear as in Fig. 2(a). If multiple copies of the g-region are randomly broken into fragments, then a fragment of length 1 can be aligned with the superhelix to appear in a top view as a ring with two tails of equal length [Fig. 2(b)]. Let k equal the sum of the length of the two tails so that I = ms + k where m is any positive integer or zero. The range of k will be determined by the length of the repetitious unit or S, and by the minimum number of base pairs necessary to form a stable helix or b,. Thus (b,- 1) < k < (s+ 6,) for all I greater than (s+ b,- I). Obviously, when the value of 1 is smaller than (s+ b,), no rings can form. Since the fragments are produced by random breakage of multiple copies of the g-region, fragments with the s different values of k will be present in equal frequency. In addition, for each value of k, the s different circular permutations of the fragment sequence will also be present in equal frequency, that is the frequency of fragments beginning with the first nucleotide of the repetitious unit will be equal to the frequency of fragments beginning with the second, third, or 8th nucleotide of the repetitious unit. In what follows,

334

B.

(a)

Side

view

M.

DANCIS

Top view

9

Circumference

=S

g’\J

(d)

63)

FIG. 2. Helical representation of a fractional tandem g-region. (a) A g-region is shown as a super helix. One turn of the helix is equal to one repetitious unit, so that the repetitious sequences, shown as thickenings in the line, lie directly above one another. The lengths of the repetitious unit and sequence are s and 9’ respectively. (b) Fragment of g-region wound into a super helix. In top view are shown two of the s possible circular permutations of fragments with length I. k is equal to the remainder after dividing 1 by an integral number of s. Thus k = I - ms where m is an integer greater than or equal to zero. For the example used in the figure, m = 3. The range of k is b,, to (s + b. - 1) inclusive where b. is the minimum number of base pairs required for a stable helix. k/2 base pairs at each end have been displaced from the superhelix in the top view for clarity. The side view and the top view on the left are for the same permutation. (c) The same as (b) after resection when the length of the resection, r, is less than k/2. The thin lines show the region of potential helix formation and the dashed lines the resected ends. (d) Same as (c) when 2r > k. (e) Same as (c) when r 2 s/2 -I- bo.

RING

THEORY

335

we shall see that the ability to form rings or cyclize depends on the value k. Fragments of different lengths which have the same value of k will also have the same frequency of cyclization or Fk. When a fragment is partially resected with a 3’- or 5’-specific exonuclease to produce two single-strand ends r nucleotides long, the length, a, of the region where the two resected ends overlap, will be 2r- k [Fig. 2(c)]. Note that the maximum value of a is k. When the value of 2r is larger than that of k, the overlap will appear as in Fig. 2(d). When the overlap contains b, or more contiguous nucleotides of the repetitious sequence, a stable helix can form to produce a ring. Of the s circular permutations, only (a+ g’ 2b,+ 1) of them will have at least b, nucleotides of the repetitious sequence in the overlap and be able to cyclize. The frequency of cyclization for fragments with a specific value of k when k 2 r is F,=t(a+g’-2b,+l)=

i(Zr-k+g’-2b,+l)

and when k c r is Fk = ‘, (k+g’-2bo+l).

(2)

When the length of the overlap, a, is smaller than b,, then, by definition, there will never be enough base pairs to form a stable helix and Fk = 0. For fragments with a specific value of k, all will cyclize (i.e. Fk = 1) when (a+ g’ - 2b, + 1) 2 S. A population of fragments, heterogeneous for the value of k, will completely cyclize when (2r - k+ g’ - 2bo + 1) 2 s for the fragments with the largest value of k (i.e. k = s+ b, - 1). Thus F, the average value of Fk for all values of k, will increase from 0 to 1 as r increases from b, to (S-g’/2+3bJ21). In Table l(a) are shown the equations which apply for given values of r and k. When r 2 s/2+&,, some values of r and k will give an overlap in two places [Fig. 2(e)]. With two overlaps, the value of Fk is independent of the value of k and equal to [2r-s+2(g’-2b, + 1)1/s. The equations which apply under these conditions are shown in Table l(b). When the sums of both Table I(a) and (b) are determined by standard methods, F =s’z [(f-+s’-2b,+1)2-(g’-b,)(g’-b,+I)].

Equation (3) will be called the general solution for the fractional tandem model. In the remainder of this section, it will be shown how the general solution for the fractional tandem model can be modified to apply to both the pure tandem and intermittent repetition models.

J k

RING

331

THEORY

Pure tandem model

The value of F for a pure tandem model can be approximated by substituting s for g’ in the general solution. This, however, will result in an overestimate of F because, as stated above, when (a+g’ - 2bo+ 1) > s, Fk = 1 and not (a+ g’ - 2b, + 1)/s. The correct value of F can be determined by setting 1 as the maximum value of each Fk. There is no single equation comparable to equation (3) which can be written to give the correct value of F. Instead it is necessary to determine if (a+g’ - 2bo + 1) is greater than s for each value of k by a reiterative process and then to use the appropriate value of Fk in the calculation of F. This has been done using a Hewlitt Packard Calculator Plotter and the results will be called the reiterated solution. In the pure tandem model, whenever the overlap is b,, or more nucleotides, all the fragments will cyclize and Fk = 1. Since the minimum value of the overlap is b,, Fk is either zero or 1 for the pure tandem model. This simplifies the reiteration so that 1 = 2(r - b,)/s.

(4)

If there are pure tandem g-regions with repetitious units of different lengths, then F = 2(r-b,)/S where S is the number average length of the repetitious units. This result is identical to that obtained by Thomas et al. (1973~) for the tandem repeat model [see their equation (25)]. Figure 3 shows the difference between the general and reiterated solutions for the pure and fractional tandem models. For both solutions CI, the ratio of g’ to s, has a significant effect on the shape of the curves. As c( increases, the curves become less concave and smaller resections are required for maximum cyclization (i.e. F = 1). The two solutions differ only when (r + g’ - 2bo + 1) is greater than s. Even then, there is at most a 20 % error. Thus, for most applications, the general solution can be used for the pure and fractional tandem models. Intermittent repetition model

The general solution as written in equation (3) has two components. The first component, (r + g’ - 2bo+ l)‘/s’, is the probability that both resected ends contain at least b, contiguous nucleotides of the repetitious sequence; the second component is the probability that the parts of the repetitious sequences exposed in the two ends cannot form b, or more contiguous base pairs. Both probabilities assume that the two sequences exposed at the ends of each fragment are independent of each other. If Q(U) is the probability that

338

B.

M.

DANCIS

!,//750/500

r (fraction

/250/

125

of s 1

rhucieotides) 3. Comparison of general and reiterated solutions for the fractional tandem model. The frequency of cyclizable fragments (F) is plotted as a function of resection, r, for different values of a (the fraction of the repetitious unit that is repetitious; LX= g’/s). The general solution (-----) is an approximation of the reiterated solution (---). The pure tandem is the special case of the fractional tandem when a = 1. See Fig. 1 for explanation of symbols. FIG.

the two sequences are not complementary, then the general solution for the fractional tandem becomes F = (r+g’-2bo+1)Z/sZ-Q(u) (5) where Q(U) is a function of the amount of internal repetition in the repetitious sequence. The value of Q(U) in equation (5) can be calculated as follows. Let g’ = qu where u is the length of the largest block of nucleotides not composed of repeats of smaller blocks. The number of such blocks in a repetitious sequence of length g’ is q, and the value of q is some integer between 1 and g’ inclusive. When u is greater than or equal to b,, and less than r, Q(u) = (u-b&u-b, + 1)/s 2. In the derivation of equation (3), it was assumed that the repetitious sequence was not internally repetitious. Therefore, g’ = U, Q(U) = (g’- b,)(g’- b. + l)/s2, and equation (5) becomes the general solution for the fractional tandem [equation (3)]. If u is smaller than b,, Q(u) = (u)(u- l)/s2; if u = 1, the repetitious sequence is a homopolymer, Q(U) = 0, and F = (r+g’-2b,+1)2/s2.

(6)

RING

THEORY

339

To simplify the derivations that follow, (U - b&u-b,, + 1) will be used as the value of Q(U) in equation (5). The equations corresponding to a homopolymer repetitious sequence (U = 1) can then be obtained by substituting b, for U. As mentioned above, both components of equation (5) assume that the two sequences exposed at the ends of each fragment are independent of each other. In the derivation of equation (3) which gave rise to equation (5) this independence resulted partly from the heterogeneity in the length of the fragments. It could have resulted instead from a heterogeneity between g-regions in the length of the repetitious unit. Then, fragments from each g-region would have a particular value of Fk and fragments from all g-regions would have all values of Fk. The average fraction of cyclizable fragments would still be given by F in equation (5) except that S, the number average length of the repetitious units would replace s in the denominator. Another cause of the independence of the sequences in the two fragment ends could be the heterogeneity within g-regions in the length of the repetitious unit. That would occur if the g-regions were intermittent repetitions (see Fig. 1). Consequently, equations for the fractional tandem and the intermittent repetition are identical except that S represents the number average value of s between g-regions for the former and within g-regions for the latter. Thomas et al. (1973~) also derived equations for the intermittent repetition g-regions. They included in their definition the assumption that the repetitious sequences were homopolymers and derived equation (6). Since the organization of the g-region is what truly distinguishes the intermittent repetition from other models, the composition of its repetitious sequences will no longer be part of its definition. An intermittent repetition g-region will, from now on, consist of multiple copies of a single repetitious sequence distributed in an irregular fashion among unique sequences regardless of the internal homogeneity of the repetitious sequence. The effect of the composition of the repetitious sequence on the fraction of cyclizable fragments can be seen in Fig. 4. Cyclization occurs more readily when the repetitious sequence is a homopolymer than when it is a heteropolymer. If IX,the ratio of g’ to s, is small, there is very little difference in the homopolymer and heteropolymer curves. As a increases, the differences become significant. Extrapolation of the homopolymer curves to the ordinate gives positive values of F while it gives zero or negative values of F for the heteropolymer. Also, maximum cyclization with homopolymer sequences is reached at shorter resections than with heteropolymer sequences of the same length. In this section, it has been shown how the general solution of the fractional tandem can be used to describe the fractional tandem and, with modification, the intermittent repetition and pure tandem models of chromosome organiza-

340

B.

M.

DANCIS

r (nucleotides)

*0

FIG. 4. The effects of the composition of the repetitious sequence on ring frequency. Ring frequency when the repetitious sequence is a homopolymer (- - -) or internally unique (-) calculated from equation (6) and the general solution of equation (3) respectively. See Fig. 1 for explanation of symbols.

tion for g-regions of infinite length. In the next section, the equations will be modified to describe g-regions of finite length. (B) g-REGIONS

OF FINITE

LENGTH

Whenever the length of the resected fragment is long compared to the length of the g-region (I-2r+b,+ u > g), there will not be sufficient nucleotides of the g-region exposed in the resected ends to form a ring. Similarly whenever the fragment length is short compared to the length of the repetitious unit (I < s+b,) there is effectively a maximum of one copy of a repetitious sequence in each fragment and again rings cannot form. Thus only g-regions which satisfy the conditions of (g - u+2r - b,) 2 I 2 (s+&) can have their sequences in cyclizable fragments. For such g-regions the ability to form rings can be divided into two cases: Case 1. (g > 1+2g’-s-l and s > r+g’-u/2-3&/2-1) Because the resections are relatively short, not all the fragments which come entirely from such g-regions will cyclize. The ring frequency will be a

RING

THEORY

341

function of Fk for similar g-regions of infinite length and can be calculated with the following assumptions: (i) all values of s are smaller than all values of g ; (ii) the distribution of the number of g-regions with a given s is independent of the length of the g-region; (iii) the sequences of the g-region exposed at the ends of each fragment by resection are independent of each other. The independence can result from heterogeneity in the size of the repetitious units within and/or between g-regions and/or heterogeneity in the size of the fragments. Assumption (iii) permits the value of F in equation (5) to be used as the average value of all Fk for all g-regions satisfying the limitations of case 1. It also permits the equations to be applied to the intermittent repetition as well as the fractional and pure tandem models. The fraction of all the fragments which form rings due to g-regions satisfying the case 1 conditions will be R, = ql(l

-l+/g,)F

(7) where yi is the fraction of the genome which contain such g-regions, S is the number average length of such g-regions, I+ is (l+g’ - S, -r--&/2), (I- I+/Si) is a correction for fragments which contain both g-region and nong-region sequences, F is the same as in equation (5) and E is the efficiency of forming and scoring rings. The derivation of equation (7) is similar in form to equations (3)-(12) in Thomas et al. (1973c) and is left to the reader. Case 2. (g < 1+2g’-s and/or s < r+g’-u/2-3&/2) Because of the large resections and/or size of the fragments, whenever a fragment contains at least b, nucleotides of such g-region sequences in both resected ends, it will be able to cyclize. Thus the fraction of all the fragments which can form rings of g-region sequences satisfying the case 2 conditions will be R, = Ml - I’ISJ (8) where I’ is equal to (I-2r+2b, - 1). As (r+g’ - u/2 - 3b,/2) approaches s in size, equation (7) approaches equation (8) because F goes to 1 and I+ approximates I’. The observed ring frequency, R, will be the sum of equations (7) and (8) so that R = R,+R, =EY~(~-I+/S~)F+EY~(~-~‘/~~). (9) 5. Discussion

The purpose of deriving the equations in the first part of this paper is to be able to determine the amount, distribution, and organization of repetitious sequences in DNA from cyclization experiments. It has already

342

B.

M.

DANCIS

been shown how to determine the number and length of g-regions from such experiments (Thomas et al., 1973~). In what follows, it will be shown how to determine the internal organization of the g-regions. At small resections, most of the fragments which contain only g-region sequences will not be cyclized, and the value of F will be much less than 1. As a result, almost none of the g-regions satisfy the case 2 conditions and equation (9) reduces to R = ~y*(l -l+/g*)F

(10)

where y* is the fraction of the DNA in g-regions longer than (I- 2r + b. + u) and g* is the number average length of such g-regions. Except for the calculation of F, the differences between equation (10) and equations (17) and (26b) of Ring Theory I (Thomas et al., 1973~) are small and will be ignored. In Ring Theory I the value of F was not derived, but was estimated to be the product of a and the value of Ffor a pure tandem model [i.e. 2g’(r - b&x2]. When compared to the derived value of F from the reiterated and general solutions of the fractional tandem, this estimate is inaccurate except when o! is approximately 1 and/or r is much smaller than s (r < 200 in Fig. 3). As the resection is increased, there will be a rapid increase in the value of R as the value of F increases to 1. Finally, when r = (s - g’ + u/2 + 3bo/2 + l), F = 1 and all the fragments which came entirely from a g-region will cyclize. Now the value of R, in equation (9) is negligible and the observed ring frequency will be R = sy*[l - l’/g*].

(11)

The definitions of y* and g* are the same as in equation (10). This equation is identical to equation (12) in Ring Theory I with the exception that the lower limit of g is I- 2r + u + b, - 1 and not l- 2r + 2bo - 1, a small difference that can usually be ignored. Further resection will only affect fragments which contain non-g-region sequences at one or both ends. More of them will cyclize, but the rate of increase, dR/dr, will be much smaller than before [i.e., when F < 1 and R = .sy*(l -I+/g*)F]. When 1 is large compared to most values of s, dR/dr will be almost zero. Then, a graph of F versus r will accurately show the changes in ring frequency with increasing resection when the maximum value of R or R,,, has been normalized to one [i.e., F = R/R,,, =

R/~y*(l -l’/g*)l. When the solutions for the fractional tandem were derived, it was assumed that S/s, the average value could be used to describe the effects of all the repetitious units as a whole whenever the length of the repetitious

RING

THEORY

343

unit was heterogeneous. In addition, it was assumed that the length of the repetitious sequence and the number of smaller, internally non-repetitious blocks making up the repetitious sequences were the same for all repetitious units. In t&o, none of these assumptions may be correct. In Fig. 5 the effects are shown of certain kinds of heterogeneity in the repetitious sequence and/or the repetitious unit on the value of F. Since the size of s is usually unknown, fractional tandems can only be detected when the curves appear concave [compare Fig. 5(a) and (b)]. If the DNA contains fractional tandems by themselves or mixed with long pure tandem [Figs 5(a), (d) and 31, the curves do appear concave and the presence of fractional tandems could be inferred. The accuracy of ring frequency data is such, however, that the presence of DNA with pure tandem repeats about 1000 nucleotides or shorter tends to obscure the presence of the fractional tandems [Fig. 5(c) and (f)]. Thus, in many cases, a graph of F versus r cannot be used to distinguish between pure and fractional tandems especially if there is more heterogeneity in the g-regions than in our examples. Such a graph, however, might be used to determine the limitations of s and g’ as follows: an abrupt decrease in the slope of the curve will occur when F just equals unity for one of the components [arrows in Fig. S(a) and (b)]. For both pure and fractional tandems, this occurs when r = s-g’+u/2+3bO/2f. For intermittent repetitions within a g-region, and for fractional tandems between g-regions, there will be a sharp decrease in slope if s is narrowly distributed or a gradual decrease if it is broadly distributed. The decrease in slope can be observed in a graph of ring frequency versus resection (R u. r), and will have the same significance as the decrease in a graph of F v. r. In fact, whenever the value of (1 -r’/g*) decreases significantly during the course of the resection, the value of F cannot be calculated directly, and changes of slope are most accurately seen in a graph of R u. r. In practice, maximum cyclization requires larger resections than expected. Pyeritz & Thomas (1973) used T7 DNA which contains a terminal repeat of 260 nucleotide pairs. Assuming that the repeat is not internally repetitious, the maximum cyclization should occur when r = (s + b,)/2 N 150 nucleotides and not the 300 nucleotides as indicated by their data. Thus values of s, g’ and u are probably overestimated by a factor of two or more when based on a graph of ring frequency versus resection (R u. r). Most cyclization experiments, however, do not use a DNA as homogeneous as T7 DNA. In Fig. 6 are drawn the data points for Drosophila melanogaster (C. S. Lee, personal communication). Similar results have been obtained 7 In the pure tandem where s = g’ = U, this approximation will give Y = s/2 + 3&/2 which is not significantly different from the true value of r = s/2 + b,.

0

0.5 -

I.O-

(dl

250

500

750

1000

/

-I

500 r (nucleotides)

750

,

/---------.‘/’ ,‘mco,100) .’ / ,’ -0’ ‘eooo;2wol / //

&q=“, I3 250 u

(e)

1000

,.

.

.

/

250

/,’

500

/’

~%125;125)

,--------_-_/ ,%300,1000) ,/ / // //

0

0.5

I.0

1 c

7

FIG. 5. Ring frequency when g-regions of two different compositions are present in equal amounts. The scale on the right is the ring ). The value of F was calculated from frequency for the individual components alone (- - -) and on the left for the composite (the reiterated solution of equation (3). The magnitudes of s and d’ in nucleotides are shown in parenthesis-+; 9’). The figures show different combinations of long and short, pure and fractional tandem g-regions. Arrows showing where abrupt changes of slope occur when one of the components becomes completely cyclized are included only in Fig. S(a) and (b). See Fig. 1 for explanation of symbols.

be

,,----------‘-

r

RING

bo

THEORY

345

bo r (nucleotides)

F IG. 6. Calculated curves for some models of chromosome organization to account for the observed ring frequency in Drosophila melunogasrer. DNA fragments. Calculated curves of fractional and pure tandem models based on the reiterated solution of equation (3) with b. = 33 (-). Calculated curves based on the fractional tandem model preferred by Bonner & WI (1973) have s = 900 and g’ = 150. They also suggest that b. = g’ = 150 (- - -). Land et al. (1973) suggest that there are three to five different repetitious sequences I50 nucleotide pairs long that are serially repeated with adjacent sequences separated by 750 nucleotide pairs of non-repeating DNA. The curve for five different repetitious sequences is shown (-.-.). Calculated curves of a homopolymer fractional tandem with b, == 33 (. . .). The data points shown are from C. S. Lee (personal communication). See Fig. 1 for explanation of symbols.

with D. uirilis polytene DNA fragments (Lee & Thomas, 1973). Calculated curves for several different models of chromosome organization are also shown in the figure. Bonner & Wu (1973) proposed that g’ = 150 and s = 900 for D. melanogas?er.Clearly their values for s and g’ do not fit the data. Laird et al. (1973) suggested that there are three to five different repetitious sequencessequentially arrayed in each g-region with g’ = 150 and s = 900. A fractional tandem satisfying Laird’s parameters gives an even poorer fit. The other values used in the figure give better fits, but the data is not sufficiently accurate to determine which values or combination of values gives the best ht. In addition to their values of s and g’, Bonner & Wu (1973) proposed but did not substantiate that the upper estimate for the minimum number of basepairs necessaryfor stable helix formation, 7.11. 23

346

B.

M.

DANCIS

be, is about 150. Based on the results reported by Thomas & Dancis (1973u), b, was estimated to be about 33 base pairs for rings formed 25°C below the melting temperature of the DNA or Tm. Using Bonner & Wu’s value of b,, in equation (3), their values of s and g’ give an even poorer fit than before (Fig. 6). Since changing b0 primarily affects the x-intercept, only if b, is less than 33 might a better fit result. Consistent with the T7 experiments, and based upon the results of other experiments, it appears that the estimates of s in Fig. 6 are too high. Schachat & Hogness (1973) analyzed D. melangaster rings by renaturation kinetics, while Peacock et al. (1973) first isolated satellite and mainband DNA and then determined the ability of each to form rings. Both groups concluded that for whole D. melanogaster DNA such as used in Fig. 6, at least 50 to 75% of the rings come from satellite sequences. More recently, Hamer & Thomas (1975) reached a similar conclusion when they found that only satellite DNA and not DNA with longer repetitious sequences remained after digestion with restriction endonucleases. Peacock et al. (1973) estimated the lengths of the repetitious units, s, to be about 40 nucleotides or less in these satellites. The overestimates of s in Fig. 6 are probably due in part to two properties of the satellite sequences described by Peacock. First, there is extensive mismatching when some of the satellites are denatured and reannealed indicating that the repetitious sequences are similar but not identical. Fragments of such satellites would require longer resections to form a stable helix. Second, the presence of self-complementary sequences which can form a helix in a resected end, can reduce the number of nucleotides available for cyclization and therefore also require longer resections for maximum cyclization. Thus graphs of F u. r and R u. r will yield values of s overestimated by about a factor of two when the repetitious sequences are identical and even higher when they are heterogeneous and/or selfcomplementary. The graphs can still be used, however, to exclude certain models of sequence organization whenever the value of s, predicted by a particular model, is larger than that estimated from the graph. Even though Drosophila DNA does contain repetitious sequences interspersed by longer unique ones as stated in the models of Bonner & Laird, such sequences are not responsible for the rings seen in Fig. 6. In addition, it still remains to be demonstrated if the adjacent repetitious sequences, observed by Bonner & Wu, are similar and thereby form fractional tandem g-regions. Equation (3) can also be used to estimate the ring frequency at elevated temperature. Thomas & Dancis (1973) have shown that b0 is equal to BjATm where B is an experimentally determined constant equal to about 820 and ATm is the difference between the melting temperature of infinitely long DNA and the temperature of the solution. Thus R/R,, the fraction of the

RING

THEORY

347

rings remaining after the temperature has been raised to some value T, will be R [(r+g’-2B/ATm+1)2-(g’-B/ATm)(g’-B/Tm+l)]/s2. (12) g= [(r + g’ - 2b, + 1)2 -(g’ - b,)(g’ - bo + l>]/s2 When 82OfATm is greater than the value of g’, R/R,, = 0. Since the general solution for the fractional tandem was used to derive equation (12), the equation is most accurate when a maximum of about one repetitious sequence is exposed by resection (more accurately when r < s-g’+2b,). With that restriction, equation (25) is independent of the length of the repetitious unit, s. The comparable equation for the pure tandem is equation (A12) from Thomas & Dancis (1973u) : R r-k&-BIATnt (13) Ro= r+*--b, . The predicted melting curves of rings from g-regions with different values of g’ are shown in Fig. 7. The temperature at which half the rings have

At,

1°C)

FIG. 7. The effect of the length of the repetitious sequence, g’, on thermal stability. Profiles for fractional (- - -) and pure (-) tandem models are calculated from equations (12) and (13) respectively after modifications to represent a heterogeneous population whose Tm has a standard deviation of 5°C. The difference in temperature in degrees centigrade (AT,) between the pure and fractional tandem when half the rings have linearized (Tt) are shown in the insert along with g’, the length of the repetitious sequence in nucfeotides. The data is for Drosophila virilis DNA and is from Fig. 2(a) of Lee & Thomas (1973). Both the data and the curves assume r = 450 and b. = 33 nucleotides.

348

9.

M.

DANCIS

linearized, T,, may be used to distinguish between a pure tandem and a fractional tandem with a short repetitious sequence. When the size of the repetitious sequence, g’, is about 150 nucleotides or more, however, the data may not be accurate enough to distinguish between the two models. The melting data of Lee & Thomas (1973) for D. virilis rings are also shown in Fig. 7. The stability of some of the rings is even higher than that expected for pure tandem rings. Equations (12) and (13), however, are most accurate when the resection is smaller than the repetitious unit. Since Drosophila ring-forming DNA is mostly satellite, the resection of 450 nucleotides is much greater than the seven nucleotide length of the satellite repetitious unit (Gall & Atherton, 1974). The resected ends of fragments from such DNA can form a helix in more than one way and the resulting rings would have a higher than expected thermal stability. T7 ring DNA, which is much more homogeneous than the total Drosophila DNA used in Fig. 7, shows excellent agreement with the theoretical curves throughout the course of the melt [see Fig. 2(a) in Lee & Thomas, 19731. Thus the melting curves of rings can only be used to distinguish between pure and fractional tandem g-regions when the repetitious sequence, g’, is small, and that distinction may be obscured when such repetitious sequences are themselves internally repetitious. Previous use of ring analysis to determine the organization of repetitious sequences resulted in the conclusion thatthe data were only consistent with pure tandem g-regions containing repetitious units about 1000 nucleotides or longer (Bick et al., 1973; Thomas et al., 19736). The remainder of this report will show why that conclusion is no longer tenable. The arguments in favor of pure tandem repeats are partly based on the results of Bick et al., 1973. They estimated that the thermal stability of rings formed by fractional tandem repeats with a repetitious sequence 100 nucleotides long should be about 8*2”C lower than rings formed by pure tandem repeats. Based on the present results, this value should only be 6.3°C. Thus, they underestimated the thermal stability of fractional tandem rings by almost 25%. In addition, it was not known at the time that the thermal stability data are consistent with all three models of chromosome organization whenever the repetitious sequences are 300 nucleotides or longer. A second argument comes from measurements of the closure length. When resected fragments anneal, the region where they anneal is called the closure. Often the ends of the closure will be denoted by unannealed single chains (whiskers) which are visible when the electron microscope grids are prepared with formamide. The distance between the whiskers should be equal to the closure length. The maximum closure length of pure tandems is only a function of the resection, r, while for fractional tandems it is at most equal to the length of the repetitious sequence, g’, For both, the minimum value is bO.

RING

THEORY

349

The mean and median value of the expected closure length for pure tandems will be (r+bo)/2 and for fractional tandems it will be less. In Fig. 2 of Bick et al. (1973) all six closure lengths for Drosophila hydei folded rings and 14 of 18 Necturus folded rings are less than (r + Q/2. The other data in the figure are for slipped rings which are made by denaturing and then renaturing the DNA. Such rings are not suitable for this analysis because they are not formed from intact resected DNA. The probability that these numbers represent a 1 : 1 ratio of closures shorter than (r+bo)/2 to those longer than (r+bo)/2 is less than 3 %. Thus, measurements of closure lengths do not support the hypothesis that rings are formed from pure tandem repeats and in fact indicate that the closures are formed by fractional tandem repeats. Finally, Bick et al. (1973) examined the appearance of the closure for evidence of unique DNA. If a DNA fragment is composed of unique sequences separating repetitious ones, then long resections might expose two or more repetitious sequences in each arm. Such fragments could form two or more closures separated by unique DNA. On grids spread in formamide, a fragment ring would appear to have a blister or secondary ring. The secondary ring would contain the non-complementary unique sequences separating the repetitious ones in the closures. In over 1000 rings observed, only five appeared to have secondary rings and Bick et al. (1973) concluded that the vast majority of the rings were composed of pure tandem repeats. Their argument, however, fails to take into account that long resections of fragments containing pure tandem repeats can form double-helical secondary rings. Thus the appearance of rings spread in formamide cannot be used to test for the presence of unique sequences separating repetitious ones. For both pure and fractional tandem repeats, secondary rings should appear. Since the secondary rings will begin to appear when the resection is longer than half the length of the repetitious unit, these experiments indicate that either the repetitious units are longer than 2400 nucleotide pairs or, as is more likely, that the first closure which forms the primary ring somehow prevents the second closure necessary for the secondary ring. An additional argument for long pure tandem repeats has been given by Pyeritz & Thomas (1973). Because the findings of this report were not then available, they greatly underestimated the number of rings that could be formed from fractional tandem g-regions. As a result, they concluded that rings form either from pure tandem g-regions or fractional tandem g-regions where at most 20% of each sequence was not repetitious. As with the data shown in Fig. 7, there are many arrangements of repetitious sequences that can satisfy the ring-forming data for the Necturus and mouse DNA that they used, including g-regions which contain 50% or more non-repetitious

350

B.

M.

DANCIS

sequences. Thus contrary to previous conclusions, the data from ring analyses cannot exclude the presence of fractional tandem sequences in the rings. It is ironic that most, if not all, of the rings actually do contain pure tandem sequences but the length of the repetitious units and sequences are very much shorter than the 700 or more nucleotides originally envisioned.

6. Conclusion The equations derived in this paper extend the work of Thomas, Zimm & Dancis (1973c) and enable the calculation of the frequency of cyclization for DNA which contains repetitious sequences separated by unique sequences at regular intervals (fractional tandem). The same equations can also be applied to DNA with adjacent repetitious sequences (pure tandem) or DNA with irregular spacing of the repetitious sequences (intermittent repetition). Experimentally, ring formation can demonstrate the presence of clusters of similar repetitious DNA sequences. It is shown that, contrary to the conclusions of Thomas et al. (19733) ring formation data are consistent with pure tandem and some fractional tandem models. Thus, ring analysis is more sensitive to the presence than to the arrangement of repetitious sequences. The author thanks Dr C. A. Thomas, Jr. for providing the facilities for part of this work at Harvard Medical School, Boston, MA. He also thanks Dr Thomas for introducing him to the problem and for many useful discussions. The author is grateful to Dr C. S. Lee for his Drosophila data, and Dr H. Rappaport for critically reading parts of the manuscript. This work was initially supported by a National Institutes of Health Postdoctoral Fellowship (l-F02-GM 49, 155). Subsequent support by the American Cancer Society (IN-88G) and Temple University Grant-in-Aid of Research are gratefully acknowledged. REFERENCES BONNER,J. & WV, J. -R. (1973). Proc. natn. Acad. Sci. U.S.A. 70,535. M. D., HUANG, H. L. & THOMAS, C. A., Jr. (1973). J. molec. Biol. 77, 75. J. C. & ATHERT~N, D. D. (1974). .I. molec. BIN 85, 633. HAMER, D. H. t THOMAS, C. A., JR. (1975). Chromosomu 49,243. LAIRD, C. D., CHOOI, W. Y., COHEN, E. H., DIXON, E., HUTCHINSON, N. & TURNER, S. A. (1973). Cold Spring Harb. Symp. Quant. Biol. 38,311. LEE, C. S. & THOMAS, C. A., JR. (1973). J. molec. Bill. 77,25. PEACOCK, W. J., BRIJTLAG, D., GOLDRING, E., ASPENS, R., HINTON, C. W. & LINDSLEY, D. L. (1973). Cold Spring Harb. Symp. Quant. Biol. 38,405. PYERITZ, R. E. & THOMAS, C. A., JR. (1973). J. molec. Biol. 77, 57. SCHACHAT, F. H. & Hoc-, D. S. (1973). Cold Spring Harb. Symp. Quant. Biol. 38, 371. BICK, GALL,

RING

THEORY

351

C. A., JR. & DANCE, B. M. (19734. Appendix, Lee, C. S. & Thomas, C. A., Jr. J. molec. Bill. 17,43. THOMAS, C. A., JR., PYERITZ, R. E., WILNN, D. A., DANCE., B. M., LEE, C. S., BICK, M. D., HUANG, H. L. & ZIMM, B. H. (19736). Cold Spring Harb. Symp. Quant. Biol. 38, 353. THOMAS, C. A., JR., ZIMM, B. H. & DANCIS, B. M. (1973~). J. molec. Biol. 77, 85. THOMAS,