On the symmetries of multi-palindromic DNA sequences

On the symmetries of multi-palindromic DNA sequences

I. theor. Biol. (1978) 72, 57-73 On the Symmetries of Multi-Palindromic DNA Sequences DAVID J. GALAS~ Biomedical Sciences Division, University of C...

948KB Sizes 5 Downloads 51 Views

I. theor. Biol. (1978) 72, 57-73

On the Symmetries of Multi-Palindromic

DNA Sequences

DAVID J. GALAS~ Biomedical Sciences Division, University of California, Lawrence Livermore Laboratory, Livermore, California 94550,

U.S.A.

(Received 28 September 1977) There are several instances of multiple,

overlapping

palindromes

in DNA

sequences which have recently beenreported. To ascertainthe likelihood of specific biological function for these interdigitating symmetries it is necessary to calculate their probabilities of occurrence. While the probability of occurrence of a single palindrome in a random sequence is calculatedin

a straightforward fashion, the occurrence of overlapping palindromesis not so simply analyzed. In this paper a generalmethod for handling multiple symmetriesis presented.Several theoremsconcerning the constraints which overlappingsymmetriesplaceon eachother are presented.A general result is that the probability of occurrenceof a symmetry can be significantly

enhanced

by already

existing

symmetries.

As examples

of the

theoremsand methodsdeveloped,the lac operator, the lac CAP-binding site and a region of the Izleft operator are examined.The occurrenceof several overlapping symmetriesappearsto be fairly common in such sequences.

1. Introduction The discovery of the predominance of palindromic symmetries1 in DNA sequenceswith control functions (promotor and operator sites) has focused attention on the possible role the symmetry itself might play in the recogn,ition of the DNA sequence by regulatory protein molecules. Interest in sequence symmetries has also been enhanced by the possibility that gene t:ranslocations and recombinations may occur at palindromic sites (Hozumi & Tonegawa, 1976,for example). The DNA sequencespecificity of the DNAprotein interaction, however, is not embodied in the symmetry alone: the bzcrepressor-operator binding specificity, for example, can be degraded both by mutations which decreaseand those which increasethe operator symmetry (Gilbert, Gralla, Majors & Maxam, 1975). While the function of the symmetries is apparently not simple it is clear that palindromic symmetry is t Present address: Dept. Biologie Mokculaire, Univ. de Genkve CH-1211 Genke. Switzerland. $ Definition of terms: in this paper we will refer to a palindrome or palindromic symmetry as any partial 2-fold symmetry, even a single pair. A perfect palindrome is a sequence in which a/f pairs are symmetric about an axis out to some limit. Unless otherwise stated “‘symmetry” refers to palindromic symmetry. 57

0022-5 193/78/0508-0057

$02.00/O

6 1978 Academic Press Inc. (London)

Ltd.

58

D.

J. GALAS

somehow an important property of some DNA control sequences. The symmetry of the interacting protein molecule is clearly an important factor, but it is by no means the full story. For a more detailed discussion of the role of symmetry in DNA-protein interaction refer to a recent review (Gilbert, Majors & Maxam; 1976). Recently Maniatis, Ptashne, Barrel1 & Donelson (1974) determined that the N gene-proximal segment of the left operator in bacteriophage A has three interdigitating palindromic symmetries. These symmetries are quite distinct in their spatial patterns and sequence, and have close but separate axes of symmetry. It has been suggested that this multi-palindromic sequence constitutes a “multiplexing” control region for three different regulatory proteins. In order to address the question of the possible biological function of a multipalindromic sequence or any symmetric region one must answer the questions : (1) how likely is the symmetry to appear in a randomly chosen sequence, (2) given that one or more symmetries are present in a sequence, what is the likelihood that a particular additional symmetry occurs by chance? The first question is answered rather simply, but the second is not. The restrictions implied by the assumed symmetries may substantially change the probability of another symmetry being present. At one extreme, existing symmetries may be incompatible with, and therefore exclude, a particular additional symmetry. At the other extreme, existing symmetries may completely determine an additional symmetry, and therefore require its presence. It is clear then, that the presence of one symmetry may restrict the variety of other symmetries which may coexist with it within the same DNA sequence. In this paper we show how the interaction of the constituent symmetries in a multi-palindromic sequence may be analysed to determine how much of the structure of a palindrome is required for it to be compatible with other palindromes. We outline a method for the representation of symmetry constraints on a class of DNA sequences which is quite general and show how this representation may be used to analyse multiple symmetries. The results enable us to make some general observations on the probability of random occurrence of multiple palindromes. The A OL sequence is treated as an example of this process of analysis. The CAP-binding sequence of the lac control region is also briefly examined. 2. Analysis (A)

THE RANDOM

OCCURRENCE

OF PALINDROMES

Considering a sequence of length 2N, it is quite straightforward to calculate the probability, given equal likelihood of each base at each position, that 2n of the (2N) bases are related by palindromic symmetry about some specified

MULTI-PALINDROMIC

DNA

SEQUENCES

59

axis. Consider the axis at the center, with N bases on each side. The probability of any given pair, at equal distances from the axis, being complementary is we assume, t, the probability that there are n such pairs (in any order or position) is :

P(n,N)= (9) (~)y~)-, where N 0n

N! = (N-n)!n!’

A. different average base-pair composition alters (and somewhat complicates) this calculation. The probability that there is a palindrome involving at least n pairs (in any order) is therefore: P(n or more/N)

= c (2) .:. (3 (a)“(y-“For a random, 24 base sequence, for example, we would expect, with p 0.32, that we would find a palindromic symmetry involving four or more pairs. To illustrate this relation we examined (using the representation discussed in the next section) a segment within the lac operator, looking for symmetries in addition to the principal one. Excluding the principal symmetry we found 19 axes of symmetry involving four or more pairs (see Fig. 1). If in each of these symmetries we restrict the sequence considered to 12 bases on each side of each axis, we count 16 symmetries with four or more pairs-a frequency of 044, as compared to the predicted random average of O-32. In Fig. 2 is plotted the frequency on n-pair palindromic symmetries in random 24-base ( ) and 30-base sequences (- - - -). While the appearance of fairly weak symmetries (six pairs of less for this example) is quite frequent, stronger symmetries (like the principal symmetry of the lac operator, for example, with 14 pairs) are very rare among random sequences. It is important to keep in mind that in this calculation, no distinction is made between different symmetry patterns (positions of symmetric pairs)-only the number of bases involved is considered. Also plotted in Fig. 2 are the actual numbers of palindromes in the lac operator involving n or more pairs when 12 bases to each side of the axis were considered. This makes it fairly clear that the additional symmetries in the Zac operator appear with approximately the frequency we expect from random sequences. We do not expect therefore, that these secondary symmetries have a specific biological function. If a particular symmetry id required for some recognition or binding function it would be most unfortunate to have a high probability of occurrence of similar symmetries in the surrounding DNA which may compete with it as a recognition or binding site. There is some uncertainty about this point,

60

D. J. GALAS

I It-

L-l L

u

1

I

LJ,u

ULI

UU 1

u

t

uu I

I

6

L-.---

U,'L

uuuu

7

uL-.-l

6

uL.lut--

5

u

u

LJU

u5

u

u

u"

Y LJLiuLJ

uu

Ll

u" uu

u5

t uu u

u

LJU

u

I

15 uLJu4

1 +'u

uu

uu

4

UU

L-AU

ii

l-l

u

ULjJU

LJU

u

L-l

u u

LJuuLJLlu u

4

i

uuu

t f

L-JU

u4

u4

Ll u

LJU !A u

u

u

4 u

4 4 4 4

t FIG. 1. The palindromes of the lac operator region. The symmetry indicated by the boxes is the principal palindrome. The arrows indicate the axis of symmetry for all the palindromes and the numbers to the right indicate the number of pairs involved.

though, since the specific sequenceswithin the symmetric region may be sufficient to distinguish the functional region from its interfering likenesses with the same degree of symmetry [see section 2(F)]. The lac operator is certainly not a randomly chosen sequence with respect to symmetry in the sense that it does have a strong symmetry, and yet it appears that, with regard to the frequency of occurrence of palindromes other than the principal one, it is very much like a random sequence. The deviation from random, if any, is toward a higher than random occurrence, which is what we expect from the considerations of section 2(~). The symmetry properties of a specific sequence, like the lac operator, are relatively simple to represent and analyze. But the properties of a set of sequences which are related only by their possessing a common symmetry are more difficult to represent and analyze because the common symmetry only

MULTI-PALINDROMIC

0

I

DNA

2 Number

3

4

5

of pairs

6 I”

61

SEQUENCES

7

a

,lQllndrO~e

9

IO

!I

t/l)

FIG. 2. The frequency of n-pair palindromes in a random sequence, obtained by evaluating equation (2) for a 24 base sequence ( --) and a 30 base sequence (. . . .). The data points are the frequencies in the /UCoperator and the error bars indicate estimates of Poisson standard error.

implies that members of affected. To representing

certain relations between bases must be in effect within all the set. Probability calculations concerning this set may be thus handle these relations we therefore require a simple method for the constraints. (B)

THE CONSTRAINT

MATRIX

We would like to consider a particular subset of the set of all DNA sequences of a given length-the subset of those having some particular symmetry. A convenient way to represent such a set is to construct a representation, which we call the constraint matrix, of the relationships between the bases implied by the symmetry. For sequences N bases long an N x N matrix can represent the relation, if any, of each base to every other base. For the ele:ments of this matrix {Cij) we will use the convention that: Cij = - 1 indicates that base i is complementary to basej, and Cij = + 1 indicates that base i is identical to base j. Unless specifically stated otherwise we will use Cij = 0 to indicate the absence of a constraint, rather than the constraint requiring the bases i and j to be non-identical and non-complementary. The ma.trix is necessarily symmetric: i.e. Cij = Cji. The set of sequences of length N with no constriants is represented by a matrix with ones along the diagonal (upper left to lower right) and zeroes elsewhere. A perfect palindrome is represented by the matrix with the same diagonal of ones and an inverse dia.gonal (lower left to upper right) of minus ones : the axis of 2-fold symmetry in the palindrome corresponds to the point in the matrix where the diagonals cross. As an illustration of this representation the palindromic symmetry with

62

D. J. GALAS

the structure shown in Fig. 3(a) is represented by the matrix in Fig. 3(b) (where blanks are substituted for zeros). Any constraint matrix represents the set of all sequences in which the indicated relationships apply. In the case given in Fig. 3, the matrix is complete, that is, no more relationships are implied by the palindromic symmetry.

(b)

2 4 6

1

__-.-

'

1

1

-1 1

1

a

1

1

IO [ 12

-1

1

-1

-1

-1

1

1

1 I4 1 -1 11 16 / +1 ! Ia- -1 1 ; u-1 1 / 20-i-I I I I I I I I I I I I I I I I I I?l 2 4 6 8 IO 12 14 16 I8 20 FIG. 3. A hypothetical cribed in the text).

-1 -1

1

-1

-1

symmetry (o), and the corresponding

constraint matrix (des-

But when two or more symmetries are imposed, relationships besides the mere representation of the symmetries are implied. By applying simple rules one can determine the implied relations, fill in the matrix and check the consistency of the logical relationships represented. To go from the representation of the symmetries to a complete, consistent constraint matrix the following two rules may be applied: (1) A right triangle of any three elements determines a fourth element at the corner needed to make a rectangle. (2) All rectangles must be filled in such that the product of all (non-zero) elements is + 1 (i.e. they must have none or two each of + l’s and - l’s). such that x1*x2*x3*x4 [;y;;

::I-[::

= +l.

I:]

These geometric rules are the consequences of very simple logical relationships. For example, if two bases are each complementary, or each equal, to a

MULTI-‘PALINDROMIC

DNA

SEQUENCES

63

third base they must be equal. If a base is complementary (or equal) to a second base it must be complementary (or equal) to all bases which are identical to the second. Also, a base must be equal (or complementary) to all bases complementary to the second. These requirements are embodied in the general rule for elements of the matrix: Cij = CikCjk. (3) The “rectangle rule” is a simple extension of this relation, since Cj, in equation (3) can be written in terms of any two (non-zero) elements CjlCkl = C‘j,. Putting this in equation (3) yields: = CikCjlCkl? (4) which is equaivalent to rule (2). The reader can easily verify the validity of the rules by putting the possible relationships between the set of four rectangular elements into equation (4). To further illustrate the process of representation and the relationships generated by coexisting symmetries, take the example of imposing a second symmetry on the class of sequences represented by the matrix in Fig. 3. The second symmetry, chosen arbitrarily, is illustrated in Fig. 4(a) (- - - -), and represented by a second anti-diagonal line of minus ones in Fig. 4(b). The coexistence of the two symmetries requires that additional relations hold. These are marked in squares in the constraint matrix of Fig. 4(b). These cij

4c

L-

6’-

24

6 8 IO12

14

16

I8

20

FIG. 4. The symmetry of Fig. 3 with an additional symmetry superposed, and the resulting constraint matrix.

64

D. J. GALAS

implied relations are generated by application of the “rectangle rule.” The class of all those sequences containing both symmetries must have the additional properties indicated, which includes an additional partial palindrome (of two pairs) centered between bases 7 and 8. (C) SEVERAL THEOREMS

The process illustrated here can be used to reach several general conclusions concerning the coexistence of symmetries which are presented here in the form of theorems. Proof of these theorems is omitted in the interest of brevity. They are fairly straightforward, and the interested reader may demonstrate them to himself most convincingly by following the examples given or by constructung his own. Theorem I

Any two palindromic symmetries can coexist in a set of sequences with no other constraints. This theorem means that one can always find sequences of specified length which contain any two specified symmetries. There are no two incompatible symmetries. The simplest demonstration of this theorem lies in the fact that in adding the constraints of a second palindrome to a constraint matrix with an existing palindrome a contradictory constraint can never be generated. Theorem II

Two coexisting palindromic symmetries may require the existence of additional palindromic symmetries. All these additional symmetries will have centers equally spaced at a distance from each other and the original two axes of symmetry, equal to the distance between the axes of the original two symmetries (as shown in Fig. 5, induced symmetries may have axes of symmetry only at the open arrows if the original symmetry axes are d bases apart). An immediate corollary to this theorem is that no symmetry with its axis between the axes of two previously existing symmetries is required by them.

FIG. 5. Illustrations

of the axis spacing discussed in theorem II.

Theorem III

In contrast to theorem I, two symmetries cannot, in general, coexist with any third symmetry. The additional relations required to make the two symmetries compatible may exclude certain additional symmetries from being

MULTI-PALINDROMIC

DNA

65

SEQUENCES

imposed. Another way of stating this is that the set of all sequences which contain the first two symmetries might contain no sequence which also contains the third symmetry. A corollary to this theorem is that a fourth symmetry may be excluded by three compatible ones and so on. The intuititive notion that two perfect palindromes (“non-hyphenated”), by imposing very strong constraints on each other may not be compatible at all, is contradicted by theorem I. It so happens, however, that for three perfect palindromes the situation can be just the opposite. Theorem IV

Three base-centered, mutually-overlapping, perfect palindromes are never compatible. (A base centered palindrome has its axis of symmetry on a base rather than on the space between bases). Three space-centered, overlapping, perfect palindromes are compatible, however. As an illustration of theorem IV, as well as an illustration of the very regular symmetries generated by making two perfect palindromes coexist we display the constraint matrix for two base-centered palindromes in Fig. 6. The original palindromes are indicated at the top of the figure. The requirement that the two perfect palindromes coexist produces eight new palindromes none of which are without gaps (that is, they are hyphenated). These palindromes required for the compatibility of the perfect palindromes are shown beneath the matrix in Fig. 6. In the case of overlapping space-centered, perfect palindromes the situation is fairly trivial however; in fact, we may state a theorem concerning this situation. Theorem V

In the region of overlap of two space-centered, perfect symmetry required by the palindromes is a tandem repetition of length equal to twice the distance between the axes of dromes. In the overlap region then, two coexisting, perfect equivalent to a tandem repeating palindrome. (D)

RANDOM

OCCURRENCE

OF A PALINDROME

palindromes the of a palindrome the larger palinpalindromes are

IN THE PRESENCE

OF OTHERS

Up to this point we have considered only the compatibility or incompatibility of symmetries, and the nature of the implied constraints. We now take up the problem of determining how the probability of occurrence of one symmetry is affected by the presence of another (or others). The a priori probabilities of occurrence of the symmetries in our original example (Figs 3 and 4) are easily calculated. Since no generality is lost by doing so let us consider these. The particular symmetry of Fig. 3 (call it no. 1) occurs by

66

D. J. GALAS

‘UIU’

-

YIU

-

~U.U” u.I bY -vu -vu

UVY YI”

I -

Y

6 4

2 $ rs! I

FIG. 6. Two perfect, base-centered palindromes and the resulting constraint matrix. The “induced” palindromes are indicated beneath the matrix with their axes and the number of pairs involved.

chance in a random 20 base sequence with a probability of $4 $6 = 7.7 x lo-’ (We are including the gaps in the specification of the symmetry, i.e. the pattern is specified). Likewise the probability of the additional chance occurrence in the same 20 base sequence of the symmetry in Fig. 4 (call it no. 2) is 5.5 x 10M4. In calculating the probability of occurrence of symmetry 2 in a sequence which already contains symmetry 1 we are essentially counting the number of all sequences, which contain both symmetries and the number of sequences which contain the first one (whether or not no. 2 is present) and taking the ratio. A more convenient way of actually performing the calculation, however, is to count the number of independent constraints which contribute to the restriction of the set-any constraint which is derivable from those previously imposed clearly is already fulfilled by the members of the set and must not be counted. Thus a crucial part of the calculation is to determine, by some criteria, which of the new palindromic constraints are independent. For the situation of adding palindrome no. 2 to no. 1 intuition might lead us to believe that, since the palindromes have different axes and the constraints of one affect different pairs of bases, all the constraints specifying no. 2 are actually independent. This is indeed the case. A simple demonstration of this

MULTI-PALINDROMIC

DNA

67

SEQUENCES

fact can be performed by considering a small palindrome and ennumerating all the sequences which fit the constraints, then require any one of the possible overlapping symmetries and select those of the larger set which fit the new constraints. The probability so obtained will always be indentical to the Q priori probability. We state this result as a corollary to theorem I: the probability of finding any two palindromes in a random sequence is equal to the product of their u priori probabilities: Pr. (1 given 2) = Pr (1) Pr. (2 given 1) = Pr (2) A proof of this can be constructed using the constraint matrix and the “rectangle rule” to show that two palindromic symmetries cannot generate any (derivative constraints which fall within either of the palindromes (in the sense of giving a non-zero element along the appropriate anti-diagonal of the constraint matrix); this proof will not be presented here. When we consider the probability of adding a third symmetry to two previously existing symmetries the independence of the constraints no longer applies, in general. For example, the symmetry in Fig. 4(b) with its axis between bases 7 and 8 is required by the two symmetries in Fig. 4(a). The probability of this symmetry in the presence of the other two is therefore equal to one. These notions are summarized in theorem VI. Theorem VI

The probability of chance occurrence of a third Palindrome in the presence of two others, Pr (3 given 1 and 2) G Pr (3/l, 2), is either equal to zero (incompatibility) or: Pr (3/1,2) 3 Pr (3), where Pr (3) is the a priori probability for symmetry 3. This result has the implication that whenever two palindromes overlap the probability of a third one ‘occurring can be significantly enhanced. In the next sections we examine such a case. (E)

MULTIPLE

PALINDROMES

OF THE LEFT

OPERATOR

OF /z

Within a particular 29 base sequence in the N-proximal region of the left operator of ;1 there is a triply-palindromic sequence (Maniatis et al., 1974, Ptashne et al., 1976). It is illustrated in Fig. 7. The only immediately obvious feature of these palindromes is that the symmetry axes are evenly spaced (see theorem II). Since the compatibility constraints of symmetries I and II can appear as a symmetry about the axis of III, and the constraints for compatibility of II and III can appear as a symmetry about the axis of I, it is of some

68

D.

J.

GALAS

I ‘i 5 f 9 II I3 I”, 17 19 21 23 25 GCTCAGTATCACCGCCAGTGGTATTTATG u

u

u

LI

uu

LJIIJ iitJtLJL--

uu

Li

u

I

l-u LJ

UItU

27 29

L-JE

Ill--J

m

FIG. 7. A sequence of the left operator of L phage, as reported by Maniatis ef al., 1974, with the three interdigitating symmetries shown.

interest to see how much of the symmetry of I and III is just part of the compatibility conditions for II and III, and I and II respectively. Constructing the constraint matrix for symmetries I and II [shown in Fig. 8(a) ] we generated the additional symmetries by applying the “rectangle rule”. They have axes between bases 13 and 14, at 15, and between 19 and 20, and involve 1, 3 and 1 pairs respectively. The most signilicant symmetry (three pairs) has its axis coincidental with palindrome I of Fig. 7. Comparing this palindrome with this induced three-pair symmetry we find that three of the eight pairs involved in I are required for the compatibility of II and III. A similar construct for symmetries II and III [shown in Fig. 8(b)] shows that I and II generate four symmetries of 1, 2, 3 and 1 pairs respectively. Again, the strongest of these lies in palindrome III, accounting for three of the six symmetric pairs. Half of palindrome III is implied by I and II. For completeness we also show the constraint matrix for I and III [Fig. 8(c)]. As we expect from theorem II, none of symmetry IT is required for the compatibility of I and III. We can now address the question of how likely is the chance occurrence of each of the palindromes, in the presence of the other two. The problem may be approached by asking what relations between bases are required for the third symmetry to be included which are not required for the two existing symmetries. Let us ask first: what is the probability that III will occur in a sequence which has symmetries I and II. The additional relations required can be discovered by adding the constraints defining III to the constraint matrix of I plus II [Fig. 8(a)]. These additional independent constraints need to be added to the existing constraints in Fig. 8(a) to obtain symmetry III. They are indicated by the open squares. This then implies that the probability a sequence containing I and II will also contain III (at least) is (i)3 or &z l-6 %. In a similar manner the two situations represented in Fig. 8(b) and (c) are calculated yielding: Pr (I given II and III) E O-1 % Pr (II given I and III) = Pr (II, a priori) = 6 x lo-’

MULTI-PALINDROMIC

DNA SEQUENCES

69

(a

FIG. 8. The constraint matrices for the three 3, operator symmetries of Fig. 7 taken two at .a time. The upright squares indicate the third symmetry and show the degree of coincidence with the constraints required by the other two, as discussed in the text.

70

D.

J.

GALAS

The chance occurrence of II is not enhanced by the presence of I and III, but the chance occurrence of III is enhanced by a factor of 64 by the presence of I and II. The same is true for I in the presence of II and III. It would appear from these considerations that none of the three symmetries is simply the consequence of the other two, but two of these, are partially determined by the others and the probability of their random occurrence is quite significantly increased. If more than one of these palindromes has a biological function it is most likely the pair I and II or II and III since the occurrence of these pairs is statistically independent. It may be, of course, that ail but one are random occurrences playing no biological role. (F)

THE CAP REGION

OF THE kZC OPERON

The CAP factor, which affects the level of promotion of transcription, binds to a specific site preceding the Iac operator (Majors, 1975). In this binding region two overlapping symmetries have been identified: One by Dickson et al. (1975), first proposed as the binding symmetry, and a second one by Majors (1975), which genetic and methylation-protection data indicates is the binding symmetry. From the previous discussion it should be clear that we might well expect more symmetries to be present than just these two. In Fig. 9 we display all the symmetries actually present in this region with 3 or more pairs of bases demonstrating that there are several significant ones besides the two already mentioned. Number 3 is the likely binding symmetry (Majors), the symmetry numbered 1 is that identified by Dickson et al., the striking regularity of the spacing of the axes of 1 though 4 indicates that these are related. From theorem II we can immediately see that any two neighboring symmetries require constraints which fall into the other symmetries. They cannot, then, be independent. From the evidence in Fig. 9 alone it would be impossible to decide which symmetry or symmetries might be important for the protein binding, and only the experimental evidence relating to the actual CAP-DNA interaction can point to the binding AATGTGAGTTAGCTCACTCATr I

-

--T--

-----

FIG. 9. The sequence and symmetries of the CAP binding region of the Zuc promoter.

MULTI-PALINDROMIC

DNA

SEQUENCES

71

symmetry. This region in which three or four symmetries of roughly the same strength are clustered together is quite unlike the lac operator symmetries of which one is quite strong and the others are fairly weak. (G)

THE

EVOLUTION

OF MULTIPLE

PALINDROMES

While it is easy to imagine that a palindrome can be produced by a singie genetic event it is more difficult to make a reasonable similar hypothesis concerning the production of overlapping palindromes. One might well imagine that the number of additional identities (or complementarities) required to add a particular symmetry is also a good measure of how many single-base changes might be required for a sequence to acquire that symmetry. This is not entirely true, and those few cases in which it is not point to an interesting phenomenon. This effect occurs when two or more bases must be (changed to produce an additional symmetry but the existing symmetry is such that they must change simultaneously to avoid degrading the existing symmetry, It seems appropriate to call this a symmetry block, since under normal circumstances only successive single-base changes occur by spon‘taneous mutation (at most perhaps two contiguous bases). If the symmetries have essential biological functions the requirement for some symmetry may prove to be an impediment to the stepwise evolution of another symmetry. There are several ways that the effect of a symmetry block could be mitigated irz cico. For example, a partial loss of symmetry might be tolerated ta some extent, or perhaps not selected against at all. If the biological function depends on the interaction with a regulatory protein it is possible that a mutation in the protein could compensate for (suppress) the loss of symmetry. Certainly, in the absence of knowledge of what precise role is played by the symmetry of control sequences we can say little about the selective pressures involved. There are certainly analogous restrictions in effect in the evolution of a pol:ypeptide, whose function is a highly pleiotropic and integrative property of structure. This section can be summarized in direct analogy by saying that control sequences are probably similarly constrained. 3. Discussion A consideration of clear importp.nce is that of the influence of the DNAsequence, apart from its symmetry, in the DNA-regulatory protein interaction. The constraints imposed by the requirements for specificity of interaction, such as the specificities of the DNA-binding protein sequence, could possibly override the constraints imposed by compatibility and conservation of symmetries. It is likely, however, that these considerations of

72

D.

J.

GALAS

symmetry are quite important, and a reasonable point of view may be that there is a component of the DNA-protein interaction dependent only on the symmetry properties of the sequence and another component dependent only on the particular bases matching the protein’s structure. Obviously these specific components of the sequence may be reflected across the axis and form a symmetric binding site, but other components may be independent of symmetry considerations. In any case, it is only the symmetry itself we consider here, and this is only one part of the important structure of a binding site. I have analyzed in this paper the ways in which the constraints of palindromic symmetry interact when imposed on the same sequence to require additional symmetries. A simple method for representing and manipulating constraints of this sort, the constraint matrix, was introduced. After exhibiting several theorems concerning coexisting symmetries, we presented an analysis of a L Or. segment. Here we found that these three palindromes are particularly well adapted to each other, in that they are evenly spaced and their pairwise compatibility conditions are well matched in each case to the third palindromic symmetry. The third symmetry was shown by this analysis to be much more likely to occur by chance in this context than in a random, non-symmetric sequence. In particular, shifting to unequal spacing between axes of symmetry would generate many new constraints which are not required when the axes are equidistant. For these reasons it seems unlikely that a biological function is assigned to the third symmetry. A similar observation was made concerning the symmetries of the CAP binding sequence of the lac control region. It is reasonably clear from our analysis that upon finding a DNA sequence with multiple, overlapping palindromes one should exercise great care in attributing control fuctions to these. Not only is the physical role of symmetry in a binding site yet unclear (Gilbert, Majors & Maxam, 1976) but the chance occurrence of a palindrome in the presence of others may be significantly enhanced. The methods described here can be used to quantitatively evaluate this enhancement. I would like to dedicate this paper to the memory of Debra Marie Galas. This work was performed under the auspices of the U.S. Energy Research and Development Administration, Contract No. W-7405-ENG-48.

REFERENCES DICKSON, GILBERT,

tions,

R., ABELSON, J., BARNES, W. & REZNIKOFF, W. (1975). Science 187, 27. W., GRALLA, J., MAJORS, J. & MAXAM, A. M. (1975). In Protein-Ligand (H. Sund & G. Blauer, eds), pp. 193-210. Berlin: Walter de Gruzter.

Znteruc-

MULTI-PALINDROMIC

DNA

73

SEQUENCES

W., MAJORS, J. & MAXAM, A. M. (1976). In Organization and Expression of Chromosomes, V. G. Allfrey, E. K. F. Bautz, B. J. McCarthy, R. T. Schimke, A. Tissieres

GILBERT,

& J. Paul, eds), pp. 167-176. Berlin: HOZUMI, N. & TONEGAWA, S. (1976). MANIATIS, T., PTASHNE, M., BARRELL, MAJORS, J. (1975). Nature 256, 672. PTASHNE, M., BACKMAN, K., HUMAYUN, R. (1976). Science 194, 156.

Dahlem

Conference.

Proc. natn. Acad. Sci. U.S.A. 73, 3628. B. G. & DONELSON, J. (1974), Nature 250, 394. M., JEFFREY, A., MAURER,

R., MEYER,

B. & SAUER,