J. theor. Biol. (1979) 80, 65-82
Four-stranded DNA Structure and DNA Base Methylation in the Mechanism of Action of Restriction Endonucleasest A. STASIAK AND T. K%OPOTOWSKI Institute
of Biochemistry
and Biophysics, Polish Warszawa, Poland
Academy
(Received 14 April 1978, and in revised form 5 February
qf
Sciences,
1979)
We examined the probability of short palindromic DNA sequences to occur as four-stranded structures held together in double-helical DNA by the additional hydrogen bonds postulated by McGavin (197 1). The likeliness of the palindromes to be folded at their symmetry axes to allow the additional hydrogen bonding was considered using published physicochemical evidence and theoretical deductions. We deduced that both in vivo and in vitro the requirements may be met for duplex DNA folding which would approach palindrome complementary base bairs and thus allow the formation of the additional hydrogen bonds. However, we propose hydrogen bonding between guanine-cytosine base pairs to be different than that proposed by McGavin. Using CPK atom models we found that formation of the tertiary conformation already proposed by other authors and which we call the cage structure may be prevented or hindered by adenine, guanine or cytosine methylation. The available experimental data on recognition and cleavage site specificity of the Type II restriction endonucleases were confronted with the cage model as an alternative of the cruciform model and with the postulated effects of base methylation. The published data did not contradict the validity of the cage model and the role of base methylation in preventing the four-stranded palindrome structure. An applicability of the basic ideas of four-stranded DNA and base methylation effect to the mechanism of action of modification methyiases and other restriction endonucleases was shortly discussed, but only tentative conclusions could be reached. 1. Introduction
The evidence is available for more than a hundred of procaryotic organisms and their plasmids that they have encoded enzymes for restricting foreign DNA by endonucleolytic cleavage of diester phosphate bonds (Roberts, tThe recognition sequencesare shown throughout this paper using the conventions adopted by Roberts (1977). All start with 5’-deoxynucleotide.The methytation and cleavagesites, when known and relevant, are indicated with an asterisk or an arrow, respectively. 65
OCr22-5193/79/170065+ 18 $02.00/O
0 1979Academic Press Inc. (London) Ltd.
66
A.
STASIAK
AND
T.
KLOPOTOWSKI
1976; 1977). The ATP-independent, or Type II, restriction endonucleases recognize and cleave specific DNA sequences composed of four, five or six deoxynucleotide pairs. Almost all of them are palindromic in the sense that at equal distances from the symmetry axis one finds bases able to form Watson-Crick hydrogen bonds with each other. A few untypical ATP independent endonucleases recognize non-palindromic sequences and cleave DNA several bases apart from their recognition sites. Also the ATPdependent, or Type I, endonucleases recognize non-palindromic sequences and cleave DNA even several thousand base pairs apart. Bacterial and plasmid chromosomes code for enzymes modifying host DNA and protecting it thereby against own restriction enzymes. In all cases known such protective modification is achieved by methylation of DNA bases. Although all four dispositions of DNA base pairs are distinguishable from either the major or minor groove of the double helical DNA, it is generally assumed, since the original proposal by Gierer (1966) that local fluctuations of tertiary DNA conformation facilitate recognition of specific DNA sites by interacting proteins (Jovin, 1976). The fluctuations are likely to involve identical DNA sequences. Gierer (1966) postulated that palindromic sequences may exist as double hairpinlike protrusions of the constituent strands with bases held together by Watson-Crick hydrogen bonds. Such cruciform structure is shown in Fig. 1. An alternative palindrome structure was suggested by McGavin (1972) and described in more detail by Lim & Mazanov (1976, 1978). These authors assume that purine and pyrimidine bases of double helical DNA are capable of forming additional hydrogen bonds with the same partner specificity as that in Watson-Crick pairing as originally proposed by McGavin (1971). Due to the additional hydrogen bonds the palindromes may assume fourstranded conformation, like that shown in Fig. 2. A
FIG. 1. Schematic representation recognized by H&III.
of the hairpin structure of the sequence AAGCTT
FOUR-STRANDED
DNA
AND
BASE
METHYLATION
67
Our own considerations of hydrogen-bonding properties of DNA bases led to similar but not identical conclusions. The difference consists in that CC base pairs of the four-stranded structures are held together by hydrogen bonds other than those proposed by McGavin. In addition, we inferred that methylation of DNA bases makes formation of the additional hydrogen bonds impossible or more difficult. In this paper we present our arguments in favour of the four-stranded structure of the sites recognized and cleaved by most of the Type II restriction endonucleases. We also propose that the methylation prevents formation of such structure and protects thereby modified DNA against restriction. We also attempt to apply the same principle to the other endonucleases which cleave DNA apart from their recognition sites. 2. Criticism
of the Cruciform
Model
Due to a greater spatial exposure of DNA sequences in the hairpins as compared with linear double helical sequences, the cruciforms could be more suitable for recognition by specific proteins, including restriction endonucleases. However, several facts or well founded presumptions call in question this DNA conformation as likely to occur or suitable substrate of these enzymes. 1. Formation of the hairpins absolutely requires local DNA denaturation along at least all the symmetric segment. At physiological conditions of ionic environment and temperature this would require a special mechanism to overcome the natural restraints. Murray & Old (1974) considered the
A
FIG. 2.
AAGCIT,
G 0
C
1
Schematic representation of the cage structure of the sequences recognized by HimlIII, (left) and EcoRII, CC($)GG (right).
68
A.
STASIAK
AND
T.
KLOPOTOWSKI
hairpins made of six-base pair palindromes to be improbable structures. Even endonuclease-induced cruciform formation seems to be very unlikely. 2. Occurrence of randomly dispersed crucifonns in native DNA would complicate exposure of single strands for such processes like DNA transcription or replication. Consequently, these processes could operate only if a mechanism for denaturating the hairpins were available. 3. Only a few restriction endonucleases are able to act upon singlestranded DNA (Blakesley & Wells, 1975); although such DNA can generate hairpins, at least from longer symmetrical sequences (Gray, Sommer, Polke, Beck & Schaller, 1978). 4. The cruciform model of palindromic structures does not provide any inherent explanation why methylation of adenine or cytosine affects recognition or cleavage by restriction endonucleases in all organisms for which the relevant data are available. 5. Many restriction endonucleases produce DNA fragments with sticky ends, i.e. carrying complementary single-stranded sequences. Such ends could be formed by two relatively distant and presumably separate cuts each in one hairpin of a cruciform. A model proposing sticky end formation by a simpler cut of two diester bonds opposing each other would be more attractive. 6. The cruciform could not be detected in native DNA (Wilson & Thomas, 1974). Any physical evidence for the symmetrical hairpins in doublestranded DNA has not been reported so far. The calculations of helixcruciform transition led von Heijne & Blomberg (1976) to the conclusion that cruciforms are unlikely to be DNA structures which are recognized by proteins. These shortcomings of the cruciform model make unlikely the occurrence of such structures in native DNA or DNA fragments. Moreover, even if they occurred they would be of little help in explaining the experimental data, especially those on the role of base methylation, obtained with restriction endonucleases. This conclusion led us to consider the alternative model of tertiary conformation of palindromic sequences. 3. The Four-stranded Palindrome
Structure and Role of Base Methylation Preventing its Formation
in
As already mentioned the hypothesis of the four-stranded conformation of palindromic sequences was developed by Lim & Mazanov (1976, 1978). We started our evaluation of its adequacy by re-examining the additional hydrogen bonds proposed by McGavin (197 1) and feasibility of palindrome folding. We paid special attention to the role of base methylation. For that
FOUR-STRANDED
DNA
AND
BASE
METHYLATION
69
purpose we used first the Corey-Pauling-Koltun space filling atom blocks. (i) The additional hydrogen bonds. We examined the possibility of such bond formation in all juxtapositions of AT and GC base-pair effigies made with the atom blocks. The combinations examined are shown in Fig. 3. As indicated by the dotted lines the additional hydrogen bonds could be formed by clasping with the appropriate bond blocks in four of the combinations. In three of the tetrads the base pairs were held together by two bonds. The antipatallel GC homotetrad b could be held by either of two alternative types of hydrogen bonds. We assumed after McGavin (1977), that the overall shape of tetrads should be identical for regularity of four-stranded DNA conformation. After having examined the rhomboid figures laid out by glycosidic bonds we found that the postulate is fulfilled by the tetrads shown as a and e and by one variant of the tetrad b. We conclude that regular fourstranded structures can be formed by three types of tetrads: (1) the antiparallel AT homotetrad in which the base pairs are held together by two additional hydrogen bonds between adenine N6 atoms and thymine O6 atoms, (2) the antiparallel GC homotetrad held by two bonds between guanine N’ atoms and cytosine N6 atoms, and (3) the antiparallel ATlCG heterotetrad in which the base pairs were held by only one additional hydrogen bond between adenine N’ atom and cytosine N6 atom. It should be noted that Kubitschek & Henderson (1966) and McGavin (1971; 1977) postulated that the same GC tetrad is held by the additional bonds between guanine O6 atoms and cytosine N6 atoms. (ii) The palindrome @ding and stability. The helical efigies of several palindromes were constructed. It was found that they could be easily folded providing that the helicity at the bending site had been first distorted without, however, breaking Watson-Crick hydrogen bonds. In the palindromes folded at the symmetry axis their complementary helical segments adhered to each other along their major grooves. The folding brought together the base atoms participating in the additional bonds in (i). Special attention was paid to the GC tetrad formation. It was observed that the N’-H-N6 bonds were more favourable than 06-H-N6 bonds : the former ones came first during the folding, were better aligned and did not result in the palindrome effigy rigidity observed with the latter bonds. !=I
9-5
A=T
GSC
4=T
!==T
+=A
&
A=T
GzC
&G
&
(c)
(d)
(ei
(0)
(b)
:
if)
FIG. 3. Schematic representation of possible tetrads. Dotted lines indicate the additional hydrogen bonds.
70
A.
STASIAK
AND
T.
KLOPOTOWSKI
The folded palindrome effigies are presented schematically in Fig. 2. Unlike the cruciform presentations, the palindromes are drawn as single protrusions of four-stranded structures resembling bird cages. For convenience, they will be called, also when meant to occur in native DNA, the cage structures. It can be seen that the top base pairs, one in the oddnumbered and two base pairs in the even-numbered palindromes could not form any intrastrand hydrogen bonds. The ability of other base pairs to form such bonds was influenced by both position and chemical nature. Due to effigy rigidity the clasping with hydrogen-bond blocks was more difficult in sub-top position, especially in odd-numbered palindromes. If two AT base pairs were on the top thymine methyl groups protruded toward the sub-top base pairs and sensibly hindered their clasping. It was unhindered when GC base pairs were on the top. The clasping of AT pairs was slightly more difficult in any cage position due to a slight hindrance by thymine methyl group. After overcoming either type of hindrance caused by thymine methyl group, the one inflicted from the top upon the underlying tetrad or intratetrad hindrance, the clasping introduced slightly sensible strain into the CPK cage structure effigies. These observations imply that AT tetrads contribute less to the four-stranded structure stability than do GC tetrads. The effigies of fourbase pair palindromes were hardly held together in the cage conformation. The only pair of the intrastrand hydrogen bonds in the strained sub-top position could ensure rather limited stability to these palindromes. It seems likely that certain conditions in context DNA sequences must be met to increase the lifetime of their cage conformation. (iii) The role of modified DNA bases. Examination of the palindrome effigies containing modified bases indicated that N6-methyladenine and N’methylguanine could not make the additional hydrogen bonds, because their atoms participating in such bonds were occupied by the methyl groups. C5Methylcytosine was able to form the additional bond with guanine, but its methyl group caused an intratetrad steric hindrance. AT and GC tetrads shown in Fig, 4 could be helpful in seeing the difference between the relevant properties of the methylated purines and C”-methylcytosine. Another type of hindrance was observed when the modified bases were in the top base pairs. The methyl groups protruded toward the sub-top base pairs and inflicted a hindrance in the hydrogen bond formation. The stronger effect was observed with N6-methyladenine or N’-methyfguanine, whereas those caused by methyl groups of C5-methylcytosine or thymine were weaker, but still sensible. The effects of the modifying methyl groups in the effigies will be used in Table 1 for explaining the efficiency of their protective role against cleavage by restriction endonucleases. The effigy construction and examinations indicated that the four-stranded
FOUR-STRANDED
DNA
AND
BASE
METHYLATION
71
FIG. 4. Tetrad elements of the cage structures, AT/AT left and GCiGC right. The arrows indicate regions occupied by methyl groups which prevent or hinder the tetrad formation. Dashed arrows indicate place of modification proposed by McConnel, Searcy & Sutcliffe (1978).
cage conformation is sterically feasible and that real existence of such structures in native DNA can be taken into consideration. The cage conformation displays the numerous advantages over that of the cruciform as a naturally exposed element of double helical DNA. Its adequacy is discussed below in the same order as was done the criticism of the cruciform model. Ad 1. The formation of cage structure does not require preceding strand separation. It could be formed by DNA folding at the symmetry axis of palindromic sequences followed by spontaneous formation of the additional, intrastrand hydrogen bonds. Very likely, some factors, discussed in the next section, are obligatorily required for or facilitate the assumption of cage conformation by palindromic DNA sequences. Ad 2. The cage configuration would be maintained by weaker forces than is the cruciform. Restoration of palindrome linearity requires breaking one hydrogen bond per base pair in the cage structure and either two (AT) or three (GC) hydrogen bonds per base pair in the cruciform. One can then suppose that the cage structure should not seriously interfere with processes requiring single-strand exposure, like DNA transcription or replication. Ad 3. Single-stranded DNA cannot form the cage structures and therefore one should expect that restriction and modification enzymes shall not use it as a substrate, which was found to be the case with most of these enzymes (Murray & Old, 1974). Ad 4. As found with the palindrome effigies, methylation of adenine or cytosine in positions known to be substituted by modifying enzymes (Roberts, 1976) or guanine as it was suggested (McConnel, Searcy &
TABLE
1
Mechanisms of the protective effect of base meth-ylations against palindrome cleavage by A TP-independent restriction endonucleases. All palindromes with known methylation sites and examined for enzyme sensitivity were listed. In some cases the methyl groups were introduced in vitro. The references are indicated as follows: a, Roberts (1977); b, Greene et al. (1975); c, Mann & Smith (1977); d, Pirrotta (1976): e, T. A. Trautner (personal communication); J Bird (1978); g, Lacks & Greenberg (1977); h, SutclijJe & Church (1978). Sequence
Protective effect
Proposed mechanism
GTPy PuftC” AAGCP GAATTCb
Yes Yes Yes
Impossibility of the additional hydrogen bond formation by hT base pairs Topmethyl inflicted steric hindrance adding to the relatively weak potential of forming the additional hydrogen bonds by sub-top AT base pairs
Bad
GGATCC= AGATCTd
No No
Barn1 HpaII
HaeIII
GGATCE CCGGC GGCE
No No No
Barn1
GGAT&
Yes
Enzyme
Hit&II HindIII
EcoRI
BsuI HaeIII HhaI HpaII
YeS YeS Yes Yes
BbvSI
Yes YeS
EcoRII DpnII
YeS
EcoT2
DpnI
GATCs
Opposite
AvaI
CCt?GAC+’
YeS
Topmethyl inflicted steric hindrance too weak to counteract the potential of forming the additional hydrogen bonds by GC base pairs strengthened by the third tetrad Intratetrad steric hindrance too weak to counteract the strong potential of forming the additional hydrogen bonds by GC base pairs Intratetrad steric hindrance adding to the weak intertetrad steric hindrance caused by thymine methyl groups Topmethyl inflicted steric hindrance strong enough to prevent formation of the additional hydrogen bonds by unsup ported GC base pairs Intratetrad steric hindrance adding to the strained cage conformation Topmethyl inflicted steric hindrance strong enough to counteract the potential of forming the additional hydrogen bonds by unsupported GC base pairs Exceptional situation; presumably the enzyme cleaves the linear methylated palindrome Topmethyl inflicted steric hindrance adding to the weakness of the hetero tetrad held together by only one additional hydrogen bond
FOUR-STRANDED
DNA
AND
BASE
METHYLATION
73
Sutcliffe, 1978) prevents palindrome transition to the cage conformation. It will be shown in Table 1 that the protective effectiveness of the methylations against the action of restriction endonucleases is in satisfying agreement with their destabilizing effect on the cage conformation. Consequently, it seems conceivable that some modification methylases would require for their action the specific sequences not to be in the cage conformation. Thence, the cage hypothesis provides an inherent and unitary molecular basis for both restriction and modification mechanisms. Ad 5. The cage model provides a simpler explanation of how sticky ends are formed by restriction endonucleases. Splitting of two diester bonds belonging to two constituent strands and located diagonally, across the cage, would produce sticky ends (see Fig. 2 and Table 2). The distance between them is only a little more than between opposite diester bonds in doublehelical DNA. In the cruciform structure separate single strand cuts in two double-stranded hairpins would be required for sticky end formation. Ad 6. Neither hypothetical palindrome conformation has been detected by physical examination of native double-stranded DNA. This argument is less damaging to the cage model. First, it seems unlikely that anybody has already attempted finding such structures. Second, due to their transient character discussed in the next section they may not resist specimen preparation for most of such examinations. Summarizing the above arguments one is allowed to state that the cage model of the palindromes cleaved by restriction endonucleases seems to be more probable than the cruciform model or the simple assumption of linear, double-helical palindrome conformation. 4. Probability
of Cage Structure Occurrence in Native DNA
The possibility of folding palindrome effigies made of CPK blocks without breaking Watson-Crick bonds does not necessarily mean that the juxtaposition of palindrome complementary base pairs can occur in native DNA. The formation of cage structures seems to require overcoming the natural stiffness of the double helix and the forces which prevent DNA from assuming more condensed tertiary conformation. The study of the so-called Y transition is one of the most rewarding experimental approaches to the problem of how the compact DNA structures can be formed and persist in phage heads or eucaryotic chromosomes. Lerman (1973) concluded that phage T4 DNA could become, after the Y transition, nearly as compact as in the phage heads. Experiments with intercalating dyes which increase DNA stiffness and prevent ‘I’ transition (Cheng & Mohr, 1975) confirm the previous suggestion by Maniatis, Venable & Lerman (1974) that Y transition consists in regular
A.
74
STASIAK
AND
T.
TABLE
KZOPOTOWSKI
2
The sequences recognized by and kinds of cleavage-product ends of the Type II restrictionendonucleases. Thesequencesandsitesqfmethylationorcleavage were takenfrom Roberts(I977)exceptfor thoseforDpnII(Lacks&Greenberg, 1977) and Bsul (T. A. Trautner, personal communication). The sign ’ denotes that fact or position of base methylation were not established Character of the recognized sequence Palindrome
Yes
Number of base pairs
Six
Five
Four
Methylated base
Type of cleavage
Example of enzyme and sequence
Adenine
Flush Sticky
Cytosine Adenine
Flush Sticky Flush
Cytosine
Sticky Flush
HindI, GTPyJPnffC H~~~III, AJAGCTT EcoRI. GLAATTC SmaI, CCClGGG’ XmaI, CLCCGGG’ Impossible and unknown Unknown Impossible and unknown EcoRII. JC&)GG
Adenine Cytosine
No
HphI GGTGA + , MboII GAAGA+ distance from the recognition sites
Sticky Flush Sticky Flush Sticky
and Mnfl CCTC’
Unknown @II, JGATC BsuI, GG& HapII, CJCGG’ cleave DNA at a
folding of double-helical DNA. Also in a more recent paper (Griffith, 1978) the Y condensed DNA is considered to be folded into straight segments joined by sharp bends as in a carpenter’s rule. The author contends that 180” folds are in good agreement with his results on nick closure in a ‘P-condensed virus DNA by a bacterial DNA ligase. It was found that denatured, single-stranded DNA cannot form Y-DNA (Evdokimov, Akimenko, Glukhova, Tikhonenko & Varshavsky, 1973). On the other hand, two factors known to stabilize double-helical DNA structure, high salt concentration (Lerman, 1971) and high GC base content (Cheng & Mohr, 1975), favour the Y transition. It seems then reasonable to presume that undistorted double-helical DNA is more apt to undergo the
FOUR-STRANDED
DNA
AND
BASE
METHYLATION
75
condensing process, possibly due to an increased DNA-DNA adhesiveness. It was found that the interhelix spacing of Y-DNA is twice the distance for van der Waals’ contact (Maniatis et al., 1974). DNA is known to be hydrated and the space between DNA duplexes in Y phase is certainly occupied by water molecules. Recently, a more specific role for a fraction of these water molecules was proposed by Edwards (1978). In his theoretical study of a four-stranded DNA model he concludes that water could bridge bases of the constituent DNA duplexes by forming hydrogen bonds with them. Unlike the proposed hydrogen bonds, the Edwards’ water bridges do not have a clear cut partner specificity. However, there is one notable exception, because two thymine bases cannot form the water bridge. One can suppose that the DNA-DNA adhesiveness holds together only two but not more DNA duplexes and could account for the greater contribution of GC-rich regions to the Y transition process. We postulate that regional dehydration allows the formation of the additional and more specific bonds which would contribute both to the stability and specificity of the tetraplexed regions. Johnson & Morgan (1978) reported that a four-stranded conformation of synthetic polydeoxynucleotides was very stable as their melting temperatures were higher than those of their double-helical components, e.g. for species with AT/GC ratio of one, the temperatures were 93°C and about 5O”C, respectively. It should be noted that the melting temperature of Y-condensed DNA is lower than that of double-helical DNA (Cheng & Mohr, 1975). Therefore, if the additional, specific hydrogen bonds are formed within a DNA molecule or between separate molecules, their contribution to DNA condensation is more than expected from their fractional length. The ‘I’ transition is achieved by simple mixing of DNA solutions with those of neutral or acidic polymers and salts. The polymers are strongly hydrophilic and cause exclusion of DNA molecules from some vo!ume of solvent space (Lerman, 1973). Water sequestration plays an important role as similar DNA condensation occurs at moderate ethanol concentrations (Brunner & Maestre, 1974). The same role can be ascribed to proteins when present at appropriate concentration. The other required component of the Y-transition mixtures, the salt can be either sodium or magnesium chloride. Ott, Ziegeler & Bauer (1975) observed that in absence of any polymer magnesium salts could induce the DNA condensation which melted at temperatures lower than that for duplex DNA denaturation. Also organic compounds with multiple positive charges, like the naturally occurring spermidine (Gosule & Schellman, 1976) or synthetic basic amino acid polymers (Carroll, 1972) can promote a transition of DNA to a more condensed state. A method for direct visualization of salt induced DNA condensation was
76
A.
STASIAK
AND
T.
K-LOPOTOWSKI
devised by Vollenweider, Koller, Pare110 & Sogo (1976). Their electron microscopy photographs show that the condensed linear DNA forms folds at an end of each four-stranded stretch. The above presentation of DNA condensing process compels us to the assumption that both in bacterial cells and in vitro reaction mixtures intramolecular DNA condensation allows folding of DNA molecules. The probability of approaching two complementary sites on the same duplex DNA molecules or on separate molecules should be higher in the former case due to the enhanced local concentration effect. The enhancement is obviously the greatest when a palindrome is folded at its symmetry axis. In the next step the repulsion caused by negative charges of phosphate groups must be overcome. In their theoretical study of forces resisting DNA packaging into phage T4 heads, Riemer & Bloomfield (1978) calculated that energy input for overcoming the repulsion is by two orders of magnitude higher than that for duplex DNA bending. They estimated that free energy of DNA-polyamine interaction is equivalent to the repulsive forces. The same role can be played by magnesium and univalent cations, which are present in all incubation mixtures for DNA digestion by Type II endonucleases at concentrations often approaching those used for the ‘I’ transition. Ott et al. (1975) stated explicitly that the DNA condensation induced by magnesium in the absence of any other polymer was intramolecular. Overcoming the repulsion would allow formation of the water-bridged DNA tetraplex permitting sliding of the DNA molecule ends along each other. The sliding would oscillate around the most favourable structure determined by the additional hydrogen bonds formed in dehydration regions. The oscillation would transiently expose the palindromic sequence to the molecule’s fold and the intrastrand hydrogen bonds would extend the lifetime of the overall structure which would be the actual substrate for the respective restriction endonuclease. An alternative for the last step of the cage structure formation should be also considered. It assumes that palindromes exposed to the duplex bends may assume the cage conformation only when bound with and under the influence of an endonuclease. However, it would mean that the enzymes recognize unfinished, or distorted cage conformation. This would be incompatible with the fact of sharp discrimination of even half-methylated palindromes. Therefore, we favour the former alternative. 5. Mechanism
of Action of Restriction
Endonucleases
In the first part of this section the cage palindrome model is confronted with the available data on DNA sequence recognition and cleavage by typical ATP-independent restriction endonucleases. About 80 Type II
FOUR-STRANDED
DNA
AND
BASE
METHYLATION
77
endonucleases with known recognition specificity were listed by Roberts (1977). Three of them, HphI (GGTGA), Mb011 (GAAGA) and MnlI (CCTC), are untypical as they cleave DNA apart from their non-palindromic recognition sites. All typical ATP-independent restriction endonucleases cleave short palindromes up to six base pairs long. In a few cases the enzymes recognize the bases only as purines or pyrimidines: AvaI, CPyCGPuG (Sutcliffe & Church, 1978), AcyI, CPuGCPyC and HaeII, PuGCGCPy (de Waard, Korsuize, van Beveren & Maat, 1978). According to the cage hypothesis juxtaposition of AT and CG base pairs allows formation of the tetrad, held by only one additional hydrogen bond between adenine and cytosine but having identical shape as AT or GC tetrads. Although being weak the abnormal tetrads do not distort the cage structures. The palindromes having in the ambiguous symmetrical locations two purines on one strand and two complementary pyrimidines on the other one, e.g. CPuCGPuG, could not assume the regular cage conformation but fortunately for the cage hypothesis they have not been found thus far. It seems significant that all unambiguous positions in the three palindromes are occupied by GC base pairs. Their relative strength should compensate for the weakness of the AT/CG tetrad. It seems that these palindromes are more difficult to be reconciled with the cruciform model. The cage model does not necessarily require that the base pairs spanning the symmetry axis were complementary as is the case with almost all known six-base pair recognition sequences. The exceptional sequences are Hi&II (Roberts, 1976) and BpaI (Roberts, 1977). However, the central location of two purines on one strand and two complementary pyrimidines on the other one would hinder the folding and formation of regular cage conformation. It seems likely that the symmetry of the cage structures allows their recognition from either side by randomly diffusing enzyme molecules which doubles the enzymes’ efficiency and constitutes a selective advantage in evolution. None of the palindromic sequences is composed of only adenine and thymine, although those of only guanine and cytosine are quite common. This could be due to the weakness of the additional hydrogen bonds between adenine and thymine residues. Some restriction endonucleases are capable of cleaving single-stranded DNA. So behave HaeIII GGlCC (Horiuchi & Zinder, 1973, HhaI, GCGlC and SfaI, GGCC (Blakesley & Wells, 1976), but not EcoRI (Greene et al., 1975). However, only single-stranded forms of phage DNA and not denaturated double-stranded DNA is cleaved. This points to some special structural requirements for the enzymes to act on single-stranded DNA. We presume that the single-stranded symmetrical sequences can form minute
78
A.
STASIAK
AND
T.
KCOPOTOWSKI
hairpins. Being halves of the cage structures they can be accommodated in active sites of the restriction endonucleases and cleaved. We propose that the recognition of the cage palindromes involves three features-the base sequence, cage length and regularity of the constituent tetrad elements. The sequence recognition can be afforded only within the minor grooves since the major grooves are filled up. The enzyme interacting with the minor groove finds there elements different enough to distinguish between AT and GC tetrads. They can be seen in Fig. 3 as substitutions Cz atoms of the purine and pyrimidine bases. The importance of the cage length is probable in at least one example. It was reported that AlaI, AGJCT, had not cleaved the sequence recognized by Hi&III, AlAGCTT (Subramanian, Dhar St Weissman, 1977). It follows thereof that either Mu1 cleaves only the linear AGCT sequence or is acting only when it assumes the cage conformation of the definite length. The significance of regular tetrad shape can be deduced from the effects of base methylation on the susceptibility of the palindromes to the endonuclealytic action of the restriction enzymes. In Table 1 the DNA sequences with known methylation sites are listed. The effect of the modification on the susceptibility is indicated. In the rightmost column the effect, or ineffectiveness, are interpreted in terms of the cage hypothesis. In most of the cases the protective effect was due to intertetrad hindrance which could either prevent the cage conformation or distort the shape of one of its tetrad elements. Less frequent were the instances when the methylation caused intratetrad hindrance or total impossibility to form the hydrogen bonds essential for tetrad stability or occurrence. However, protection of certain palindromes by single pairs of methyl groups would be incompatible with the cage model. For example, CCCGGG (SmaI, XmaI and CCGCGG (SacII, SstII, TgfI) should form very stable cage conformations. One N5-methylcytosine in each strand would not distort appreciably any tetrad because of the overall stability ensured by other GC tetrads. The way of natural protection of such sequences was not described thus far. Obviously, they may not occur in organisms producing enzymes with substrate specificity of SmaI or SucII. Other possibilities predicted on the basis of the cage hypothesis are that the modifying enzyme methylates multiple cytosine residues at the C5 position or, alternatively, that the protective modification is achieved by methylation of the N’ atom of guanine or the N6 atom of cytosine. Either of the latter two methylations would prevent completely the additional hydrogen bonding of the GC tetrad which would not be compensated by the other tetrads. N’-Methylguanine occurs in Thermoplasma acidophilum which contains an endonuclease acting on CGCG sequence (McConnell, Searcy & Sutcliffe, 1978).
FOUR-STRANDED
DNA
AND
BASE
METHYLATION
79
As already mentioned the cage model predicts that the formation of sticky ends is accomplished by cleavage of two diester bonds located diagonally in the cage structure. Table 2 presents examples of different kinds of the Type II restriction endonucleases regarding primary structure of their recognition sequences and sites of the protective methylations. Their capability of producing DNA fragments with either flush or sticky ends is indicated. Out of the 12 possible combinations 8 are known to occur. The enzymes which produce flush-end fragments from four-base pair sequences or sticky-end fragments from five-base pair palindromes when adenine is the potential methylation site are not known. The model does not allow any prediction concerning the possibility of such combinations. Also unknown are the enzymes which make flush-end fragments by cleaving five-base pair palindromes regardless of whether adenine or cytosine is the potential methylation site. In this case the model provides a specific prediction. Because in these palindromes any diagonal cut produces sticky-end fragments, the endonucleases giving their flush-end products are unlikely to exist. One can assume that there is an evolutionary relationship between ATPindependent restriction endonucleases recognizing and cleaving palindromic sequences and the other kinds of the restriction enzymes. However, the data on primary structure of sequences recognizable by these enzymes are very scarce. Therefore, the underlying considerations should be understood as a tentative approach only. The best known Type I endonuclease of E. coli B, EeoB, recognizes the sequence STGANNNNNNNNTGCT3 (Ravetch, Horiuchi & Zinder, 1978; Lautenberger et al., 1978). As shown by Bickle, Brack & Yuan (1978) formation of the specific DNA complex with another Type I endonuclease, EcoK, requires the enzyme to be allosterically changed by Sadenosylmethionine and ATP. In the presence of excess ATP a DNA loop is formed and DNA is cleaved in apparently random sites up to about 7000 base pairs apart from the recognition sequence. Methylation of adenine bases, one on the strand shown above and the other facing 5’-proximal thymine on the complementary strand, does not interfere with EcoB binding to non-palindromic 15base pair long recognition sequence, but prevents the loop formation and DNA cleavage. We presume that EcoB promotes sliding of duplex DNA against the recognition sequence until a region of a certain, the most likely limited, homology is juxtaposed to it. The homology recognition is supposed to be due to the additional hydrogen bonds. Formation of the four-stranded structure triggers the cleavage of the duplex DNA opposed to the recognition sequence. Because of the limited homology many different sequences can be cleaved, which could explain the apparent
80
A.
STASIAK
AND
T.
KLOPOTOWSKI
unspecificity of cleavage sites. The cleavage geometry is not diagonal as proposed for ATP-independent endonucleases, because the DNA duplex opposed to that carrying the recognition sequence would be cleaved in trans. The untypical ATP-independent endonucleases cleave DNA only several base pairs apart from their non-palindromic recognition sequences. Because they do not require ATP, the active search for homology region seems to be ruled out. Possibly, rare contacts with sequences of certain homology trigger the cleavage, but in cis. These speculations were guided by the attempt of finding a common denominator for endonucleases of the three kinds already identified. Applicability of the cage palindrome model to the mechanism of action by modification methylases is shortly discussed below. The effigy constructions indicated that the potential methylation sites are located in the major groove of helical DNA. The sites are not accessible in the cage conformation as the major grooves are filled up. It follows thereof that the enzymes may rather act on linear palindrome forms or distorted cage structures. Such prevalence for the linear form or the distortion is expected to exist in physiological substrate of the methylases, i.e. in newly synthesized DNA in which the new strand is unmethylated. It seems reasonable to postulate that the Type II modifying enzymes should have more affinity to half-methylated DNA as it was already demonstrated for a Type I methylase activity (Meselson & Y uan, 1968). 6. General Diiussion
In this paper we try to convince the readers that the palindromic sequences recognizable by the Type II restriction endonucleases can assume spontaneously or by combining with the respective enzymes a tertiary conformation which we call the cage structure. We also suggest that similar, four-stranded structures may be involved in the mechanism of action of the Type I restriction endonucleases. As originally proposed by McGavin ( 197 1, 1972) and Lim & Mazanov (1976, 1978), the structure would be held by the additional, intrastrand hydrogen bonds. Our contribution to this hypothesis relies on the deduction that adenine, guanine or cytosine methylation within these sequences can prevent the formation of cage structures. This allowed us to check specific predictions using published data. We are convinced that the available experimental data do not contradict the cage model, but rather are in agreement with the hypothesis. We presume that the essential of the hypothesis, the additional hydrogen bonds of DNA bases bound with Watson-Crick bonds and role of base methylation can be applied to relate the tertiary DNA structure to processes
FOUR-STRANDED
DNA
AND
BASE
METHYLATION
81
other than restriction and modification. McGavin (1977) has already proposed that homologous pairing in recombination may involve the additional bond formation between two double-stranded DNA molecules. Taking advantage of the fact that E. co/i mutants deficient in methylation of GATC sequence have increased ability to recombine (Konrad, 1977), we approach the recombination process using our original deduction that DNA base methylation prevents formation of four-stranded DNA structures (Stasiak & K3opotowski, in preparation). We also consider other applications of the additional hydrogen-bonding prevention by methylation of DNA or RNA bases, like, e.g. in the functions of palindromic sequences involved in the initiations of DNA replication or transcription, and of repeat sequences in stability of eucaryotic folded DNA. We gratefully acknowledge the Wierzchowski, A. Rabczenko and paper. We also thank W. Filipowicz This work was supported by the
interest and valuable suggestions by K. L. M. Fikus, which allowed presentation of this who was first to encourage this study. Polish Academy of Sciences within the Project
09.7.
REFERENCES BICKLE, T. A., BRACK, C. & YUAN, R. (1978). Proc. natn. Acad. Sci. U.S.A. 75, 3099. BIRD, A. P. (1978). J. molec. Biol. 119, 27. BLAKESLEY, R. W. & WELLS, R. D. (1975). Nature 257,421. BRUNNER, W. C. & MAESTRE, M. F. (1974). Biopolymers 13, 345 CARROL, D. (1972). Biochembrry 11, 421. CHENG, S. M. & MOHR, S. C. (1975). Biopolymers 14, 663. DE WAARD, A., KORSIJIZE, J., VAN BEVEREN, C. P. & MAAT, J. (1978). FEBS Let?. 96, 106. EDWARDS, P. A. (1978). J. rheor. Biol. 70, 323. EVDOKIMOV, Y. M., AKIMENKO, N. M., GLUKHOVA, N. E., TIKHONENKO. S. S. & VARSHAVSKY. A. M. (1973). Mol. Biol. SSSR 7, 151. GIERER, A. (1966). Nature 210, 1480. G~SULE, L. C. & SCHELLMAN. J. A. (1976). Nature 236. 67. GRAY, C. P., SUMMER, R., POLKE, C., BECK, E. SL SCHALLER, H. (1978). Proc. natn. Acad. Sci. U.S.A. 75, 50. GREENE, P. J., POONIAN, M. S., Nu~~BA~M, A. L., FOBIAS, L.. GARFIN, D. E.. BOYER. H. W. & G~~DMAN, (1975). J. molec. Biol. 99, 237. GRIFFITH, J. D. (1978). Biopo/ymers 17, 237. HORIU~HI, K. & ZINDER, N. D. (1975). Proc. nom. Acad. Sri. U.S.A. 72, 2555. JOHNXIN, D. & MORGAN, A. R. (1978). Proc. narn. Acad. Sri. U.S.A. 75. 1637. JOVIN, T. M. (1976). Ann. Rev. B&hem. 45, 889. KIJB~HEK, H. E. & HENDERSON, T. R. (1966). Proc. natn. Acad. Sci. U.S.A. 55, 512. KONRAD, E. B. (1977). J. Bacreriol. 130, 167. LACKS, S. & GREENBERG, B. (1977). J. molec. Biol. 114, 153. LAUTENBERGER, J. A., KAN, N. C., LACKEY, D.. LINN, S., EDGELL, M. H. & HUTCHIN~~N III. C. A. (1978). Proc. natn. Acad. Sci. U.S.A. 75, 2271. LERMAN, L. S. (1971). Proc. nutn. Acad. Sci. U.S.A. 68, 1886. LERMAN, L. S. (1973). CoM Spring Harbor Symp. quant. Biol. 38, 59.
82
A.
STASIAK
AND
T.
KLOPOTOWSKI
LIM, V. I. & MAZANOV, A. L. (1976). Dokl. Akad. Nauk SSSR 231,492. LIM, V. I. & MAZANOV. A. L. (1978). FEBS Left. 88, 118. MANIATIS, T., VENABLE, J. H., Jr., & LERMAN, L. S. (1974). J. molec. Biol. 84, 37. MANN, M. B. & SMITH, H. V. (1977). Nucleic Acids Res. 4, 4211, MCCONNEL, D. J., SEARCY, D. G. & SUTCLIFFE, J. G. (1978). Nucleic Acids Res. 5, 1729, MCGAVIN, S. (1971). J. molec. Biol. 55, 293. MCGAVIN, S. (1972). Nature 242, 330. MTGAVIN, S. (1977). Heredity 39, 15. MESELSON, M. & YUAN, R. (1968). Nature 217, 1110. MURRAY, K. & OLD. R. W. (1974). Progr. nucl. Acids Res. molec. Bill. 14, 117. OTT, G. S., ZIEGLER. R. & BAUER, W. R. (1975). Biochemistry 14, 3431. PIRROTTA. V. (1976). Nucleic Acids Res. 3, 1747. RAVETCH. J. V., HORIUCHI, K. & ZINDER, N. D. (1978). Proc. natn. Acad. Sri. U.S.A. 75, 2266. RIEMER, S. C. & BLOOMFIELD, V. A. (1978). Biopolymers 17, 785. ROBERTS, R. J. (1976). CRC Crit. Rev. Biochem. 4, 123. ROBERTS, R. J. (1977). In DNA Insertion Elements, Plasmids and Episomes (A. I. Bukhari, J. A. Shapiro & S. L. Adhya, eds), p. 757. New York: Cold Spring Harbor Laboratory. SUTCLIFFE, J. G. & CHURCH. G. M. (1978). Nucleic Acids Res. 5, 2313. SUBRAMANIAN, K. N.. DHAR, R. & WEISSMAN, S. M. (1977). J. biol. Chem. 252, 355. VOLLENWEIDER. H. J., KOLLER, T., PARELIU, J. & Soco, J. M. (1976). Proc. natn. Acad. Sci. V.S.A 73, 4125. VON HEIJNE. G. & BLOMBERC;. C. (1976). J. theor. Biol. 63, 347. WILSON, D. A. & THOMAS, C. A., Jr. (1974). J. molec. Biol. 84, 115.