J. Mol. Biol. (1998) 279, 101±116
DNA Binding Discrimination of the Murine DNA Cytosine-C5 Methyltransferase James Flynn, Ramzi Azzam and Norbert Reich* Department of Chemistry and Program in Biochemistry and Molecular Biology, University of California, Santa Barbara CA 93106, USA
Mammalian DNA cytosine-C5 methyltransferase modi®es the CpG dinucleotide in the context of many different genomic sequences. A rigorous DNA binding assay was developed for the murine enzyme and used to de®ne how sequences ¯anking the CpG dinucleotide affect the stability of the enzyme:DNA complex. Oligonucleotides containing a single CpG site form reversible 1:1 complexes with the enzyme that are sequencespeci®c. A guanine/cytosine-rich 30 base-pair sequence, a mimic of the GC-box cis-element, bound threefold more tightly than an adenine/thymine-rich sequence, a mimic of the cyclic AMP responsive element. However, the binding discrimination between hemi- and unmethylated forms of these DNA substrates was small, as we previously observed at level (Biochemistry, 35, 7308 ± 7315 (1996)). Single-stranded subthe KDNA m strates are bound much more weakly than double-stranded DNA forms. An in vitro screening method was used to select for CpG ¯anking sequence preferences of the DNA methyltransferase from a large, divergent population of DNA substrates. After ®ve iterative rounds of increasing selective pressure, guanosine/cytosine-rich sequences were abundant and contributed to binding stabilization for at least 12 base-pairs on either side of a central CpG. Our results suggest a read-out of sequencedependent conformational features, such as helical ¯exibility, minor groove dimensions and critical phosphate orientation and mobility, rather than interactions with speci®c bases over the course of two complete helical turns. Thus, both studies reveal a preference for guanosine/cytosine deoxynucleotides ¯anking the cognate CpG. The enzyme speci®city for similar sequences in the genome may contribute to the in vivo functions of this vital enzyme. # 1998 Academic Press Limited
*Corresponding author
Keywords: cyclic AMP responsive element; epigenetics; GC-box; in vitro selection; transcriptional regulation
Introduction Cytosine methylation is the predominant epigenetic modi®cation of eukaryotic DNA. The function most often identi®ed with cytosine-C5 methylation (5-mC) in higher eukaryotes is the regulation of transcription (Jost & Saluz, 1993). Generally, hypermethylated genes are transcriptionally silent and inheritance of the proper genoAbbreviations used: AdoMet, S-adenosyl-Lmethionine; DCMTase, murine DNA cytosine-C5 methyltransferase; CpG, deoxycytidyl-deoxyguanosyl dinucleotide; cpm, counts per minute; CRE, cyclic AMP responsive element; GMSA, gel mobility shift assay; MEL, mouse erythroleukemia cells; PAGE, polyacrylamide gel electrophoresis; UTR, untranslated regions. 0022±2836/98/210101±16 $25.00/0/mb981761
mic methylation pattern is critical to viable development as shown by DCMTase gene knockouts in mice (Li et al., 1992). Anti-sense-directed inactivation of DCMTase mRNA (Ramachandani et al., 1997) as well as the incorporation of the cytosine analogs 5-azacytidine and 5-¯ourocytidine into DNA (Jones, 1985; Santi et al., 1983) interfere with DCMTase function and lead to cytological dysfunction. To date only a single DNA cytosineC5 methyltransferase (DCMTase) has been identi®ed in several metazoan organisms (Yoder et al., 1996). However, it is postulated that more than one DCMTase polypeptide can be expressed from the mammalian genome, whether they are derived from different chromosomal loci or are alternate splicings of the same locus is under investigation (Li et al., 1992; Tucker et al., 1996; Yoder et al., 1996; # 1998 Academic Press Limited
102
DNA Binding Studies of a DNA Modifying Enzyme
Carlson et al., 1992; Rouleau et al., 1992; Lei et al., 1996). Although, the cellular processes that determine the genomic patterns of DNA methylation are not understood, DCMTase evidently has an essential role in the process. A basic understanding of the binding and catalytic DNA sequence speci®city (discrimination) of the enzyme, and the factors which regulate this speci®city are important. Since the mammalian enzyme is a relatively large, 183 kDa protein (Glickman et al., 1997; Pradhan et al., 1997), DNA sequences ¯anking the cognate CpG may modulate the ability of the enzyme to methylate particular CpG sites. The CpG ¯anking sequence preferences of the enzyme (Bestor et al., 1992; Hepburn et al., 1991; Bolden et al., 1986; Pfeifer et al., 1985; Ward et al., 1987; Carotti et al., 1986), and its preference for single- and doublestranded substrates (Carotti et al., 1986; Wang et al., 1984; Pfeifer et al., 1985; Gruenbaum et al., 1982) are often debated and need to be more rigorously addressed. We are developing enzymological methodologies to provide a quantitative description of the DCMTase reaction cycle. A better enzymological description can be used to assess the functional differences of alternately expressed DNA cytosine-C5 methyltransferase enzymes, will provide a sound basis for understanding the importance of DCMTase interactions with other cellular factors and is essential for the characterization of regulators of DCMTase function. We recently demonstrated that the methyl transfer step in the catalytic cycle contributes largely to the preference for hemi-methylated substrates when a highly homogeneous enzyme preparation is used (Flynn et al., 1996). Also, the enzyme exhibits some sequence discrimination at the level of . Since Km parameters are often complicated KDNA m by kinetic terms beyond formation of the initial enzyme:substrate complex, we sought to de®ne the CpG ¯anking sequence discrimination under con-
ditions limited to this initial complex. Thus, we have extended our studies to include gel mobility shift analyses (GMSA) using de®ned sequences to estimate KDNA and in vitro screening of a large, D divergent pool of DNA to determine binding discrimination. The DCMTase:DNA complex is concluded to be thermodynamically stabilized by guanosine/cytosine-rich sequences ¯anking a central CpG cognate site.
Results Gel mobility shift analyses of GC-box and CRE cis-elements The DNA substrates listed in Table 1 were used because previous authors have concluded that the methylation state of the centrally located CpG within the GC-box and CRE cis-regulatory elements correlate with the transcriptional activity of speci®c genes (Iguchi-Ariga & Shaffner, 1989; Moens et al. 1993; Joel et al. 1993; Jane et al., 1993). Also, the interactions of DCMTase with singlestranded DNA and with unmethylated and hemimethylated duplexes have been actively debated and are of biological interest. These three DNA forms are found in the nucleus at particular times in the cell cycle and are known to have different DCMTase catalytic pro®les in vitro (Flynn et al., 1996). Our preliminary experiments used a standard gel mobility shift assay in which a constant, low DNA concentration was titrated with higher protein concentrations. As shown in Figure 1A, essentially all of the GC-box a/b (100 pM) binding occurred in a narrow concentration range between 5 nM and 50 nM DCMTase. A sigmoidal dependence on enzyme concentration was observed. An initial complex was formed at the lower DCMTase concentrations and an abrupt shift of most of the free DNA was coincident with the formation of a second complex at about 20 nM DCMTase. Further
Table 1. Synthetic DNA substrates mimicking transcriptional cis-regulatory elements
Deoxyoligonucleotides were synthesized as described in Experimental Procedures. The appropriate consensus is in bold type and the single, centrally located CpG dinucleotide is underlined (mC C5-methylcytosine). The complementary a, aMET, b and bMET strands were annealed to produce unmethylated, a/b, and hemi-methylated, aMET/b or a/bMET, doublestranded substrates.
103
DNA Binding Studies of a DNA Modifying Enzyme
Figure 1. Gel mobility shift analysis varying DCMTase with constant GC-box a/b. Reactions were in 100 mM Hepes (pH 7.4), 10 mM EDTA, 10 mM DTT, 200 mg/ml BSA, 5% glycerol using 100 pM 32P-labeled DNA substrate and varying DCMTase concentrations. Reactions were incubated on ice for ®ve minutes and loaded on a 6% 1 TBE PAGE, run at 250 V, 9 mA for two hours at 4 C. The dried gel was exposed to ®lm overnight. A, Lane 1: 0 nM DCMTase; lane 2: 5.0 nM DCMTase; lane 3: 10 nM DCMTase; lane 4: 20 nM DCMTase; lane 5: 30 nM DCMTase; lane 6: 35 nM DCMTase; lane 7: 40 nM DCMTase; lane 8: 45 nM DCMTase; lane 9: 50 nM DCMTase; lane 10: 65 nM DCMTase; lane 11: 75 nM DCMTase; lane 12: 95 nM DCMTase. B, lane 1: 0 nM DCMTase; lane 2: 50 nM DCMTase; lane 3: supershift of 50 nM DCMTase plus polyclonal antibody pATH52.
addition of DCMTase resulted in the loss of the more mobile complex I in favor of a less mobile complex II. Similar results were observed for GCbox a/bMET, CRE a/b, and CRE aMET/b (data not shown). The complexes shown in Figure 1A contained the DCMTase, as addition of an antibody to DCMTase resulted in a shift with lower mobility, complex III (Figure 1B). A faint band below complex I did not interact with this antibody. Coincubation of DCMTase and GC-box a/b with a 40-fold excess of unlabeled poly(dA):poly(dT), calculated on a nucleotide basis, did not disrupt the speci®c DCMTase:DNA complex (data not shown). The multiple banding of DCMTase:DNA complexes observed in Figure 1 is similar to results obtained with the bacterial cytosine DNA methyltransferase M.MspI (Dubey & Roberts, 1992). Reale et al. (1995) obtained similar gel shift results with a proteolyzed mammalian DCMTase and assumed that the slower migrating band contained two DCMTase molecules bound to a single DNA. Steady-state and pre-steady-state kinetic analyses of the DCMTase with the same 30-base-pair DNA substrates used in these studies indicated
and KDNA are 1000 to 50,000-fold higher that KDNA m D (Table 2; Flynn et al., 1996) than the DNA concentrations used to generate Figure 1 and used by Reale et al. (1995). The complexes formed in Figure 1 under limiting DNA and excess protein do not promote a detectable catalytic activity. Therefore, we determined the stability of protein:DNA complexes by keeping the enzyme at a constant concentration (100 nM) and varied the amount of added DNA (Dubey & Roberts, 1992; Yang et al., 1995). The results using this approach for GC-box a/bMET, GC-box a/b and GC-box bMET are shown in Figures 2 through 4 so that the three DNA forms can be compared. In all cases a single shifted band is resolved that is DCMTase-speci®c. Extraneous bands that are not DCMTase-speci®c also appear in the control lanes that did not contain DCMTase. Binding isotherm data are ®t well by a simple hyperbola, and each complex is saturestimations for the differable. The apparent KDNA D ent forms of GC-box and CRE are summarized in Table 2. The formation of equimolar protein:DNA complexes is supported by comparisons of the shifted band intensity at saturation, using 100 nM
Table 2. Comparison of apparent KD and Km parameters for DCMTase under conditions of excess DNA DNA substratea
KD (mM)b
KD (mM)c
Km (mM)d
GC-box a GC-box b GC-box bMET GC-box a/b GC-box a/bMET CRE a CRE b CRE a/b CRE aMET/b
1.2 0.2 1.3 0.4 0.88 0.13 0.42 0.15 0.36 0.11 >50 >50 1.5 0.4 1.0 0.3
± ± ± ± 0.28 0.04 ± ± ± 1.7 0.2
± ± ± 0.80 0.38 ± 33 8 34 17 2.8 0.3 ±
a The DNA substrates are described in Table 1. Broken lines are used when a value was not determined and values represent standard errors. b The values presented were determined by gel mobility shift assays as described in Experimental Procedures. c The values presented were determined by pre-steady-state kinetic assays (Flynn et al., 1996). d The values presented were determined by steady-state kinetic assays (Flynn et al., 1996).
104
DNA Binding Studies of a DNA Modifying Enzyme
Figure 2. Gel mobility shift analysis varying GC-box a/bMET with constant DCMTase. Reactions were in 100 mM Hepes (pH 7.4), 10 mM EDTA, 10 mM DTT, 200 mg/ml BSA using 100 nM DCMTase and varying concentrations of 32Plabeled DNA substrate. Reactions were incubated on ice for ®ve minutes then loaded on a 6% 1 TBE PAGE and run at 250 V, 9 mA for two hours at 4 C. The dried gel was exposed to ®lm overnight. Lane 1: 0.050 mM free DNA; lane 2: 0.10 mM free DNA; lane 3: 0.15 mM free DNA; lane 4: 0.10 mM DNA; lane 5: 0.28 mM DNA; lane 6: 0.45 mM DNA; lane 7: 0.63 mM DNA; lane 8: 0.80 mM DNA ; lane 9: 1.0 mM DNA ; lane 10: 2.0 mM DNA; lane 11: 2.0 mM free DNA. lanes 1, 2, 3 and 11 are control experiments without added DCMTase.
DCMTase, with the band intensities of control reactions containing 50, 100 and 150 nM DNA and no enzyme (Figures 2 and 3, lanes 1, 2 and 3; see Discussion). DCMTase:DNA complexes formed by high substrate concentrations travel with the same relative mobility as complex I in Figure 1. The complex formed between DCMTase and singlestranded GC-box bMET (Figure 4, lane 7) is shown to migrate to approximately the same distance as the complex formed between DCMTase and hemimethylated CRE aMET/b DNA (Figure 4, lane 9). determined by The estimates of apparent KDNA D gel mobility shift assays (Table 2) are in good and KDNA estiagreement with our previous KDNA m D mates from steady-state and pre-steady-state kinetic assays (Table 2; and Flynn et al., 1996). The hemi-methylated form of DNA was bound with a slightly higher af®nity than the unmethylated double-stranded form in gel mobility shift assays
Figure 3. Gel mobility shift analysis varying GC-box a/b with constant DCMTase. Reactions were in 100 mM Hepes (pH 7.4), 10 mM EDTA, 10 mM DTT, 200 mg/ml BSA using 100 nM DCMTase and varying concentrations of 32P-labeled DNA substrate. Reactions were incubated on ice for ®ve minutes then loaded on a 6% 1 TBE PAGE and run at 250 V, 9 mA for two hours at 4 C. The dried gel was exposed to ®lm overnight. Lane 1: 0.050 mM free DNA; lane 2: 0.10 mM free DNA; lane 3: 0.15 mM free DNA; lane 4: 0.10 mM DNA; lane 5: 0.50 mM DNA; lane 6: 1.0 mM DNA; lane 7: 4.0 mM DNA; lane 8: 6.0 mM DNA; lane 9: 6.0 mM free DNA. lanes 1, 2, 3 and 9 are control experiments without added DCMTase.
for both the GC-box and CRE DNA. In support of CpG ¯anking sequence discrimination by DCMTase, the GC-box substrates of each duplex form had an approximate threefold lower KDNA than the corresponding CRE DNA form. D Single-stranded substrates bound with less stability than double-stranded DNA and a GC-box a/b variant that replaced the CpG by TpA produced at diffusely shifted bands with an apparent KDNA D least ®vefold higher than the ``parent'' sequence. The binding of CRE single strands was exceptionally poor and at the limits of resolution by this technique. GMSA was capable of resolving a binding discrimination in favor of guanosine/cytosinerich sequences ¯anking a central CpG dideoxynucleotide. This ®nding is con®rmed in the following study.
Figure 4. Gel mobility shift analysis varying GCbox bMET with constant DCMTase. Reactions were in 100 mM Hepes (pH 7.4), 10 mM EDTA, 10 mM DTT, 200 mg/ml BSA, using 100 nM DCMTase and varying concentrations of 32P-labeled DNA substrate. Reactions were incubated on ice for ®ve minutes then loaded on a 6% 1 TBE PAGE, run at 250 V, 9 mA for two hours at 4 C and the dried gel was exposed to ®lm overnight. Lane 1: 0.10 mM free DNA; lane 2: 0.20 mM DNA; lane 3: 0.40 mM DNA; lane 4: 0.80 mM DNA; lane 5: 1.6 mM DNA; lane 6: 3.2 mM DNA; lane 7: 6.4 mM DNA; lane 8: 3.2 mM DNA; lane 9: 6.0 mM CRE aMET/b. Lanes 1 and 8 are control experiments without added DCMTase.
DNA Binding Studies of a DNA Modifying Enzyme
105
Figure 5. A, Randomized DNA substrate used in in vitro screening. The top strand shown was synthesized using b-cyanoethyl phosphoramidite chemistry. The PCR primers used for amplifying the shifted DNA are underlined. Primer C is underlined and contains an EcoRI restriction site. Primer D, underlined, contains a BamHI restriction site and was annealed to the randomized top strand for extension by Klenow polymerase The randomized positions are denoted as N and are dG, dA or dT on one strand and the complementary dC, dA or dT on the other strand, of the duplex. B, Individual isolates cloned and sequenced from the pooled generations. Only the guanine containing strand is shown for simplicity. Generation-5 members are arranged with the highest guanine content on the 50 side of the invariant CpG at the top. Frequency information is given for each randomized ¯ank on the appropriate border, an asterisk denotes a single occurrence. DNA from each listed generation was cloned into pGEM11zf- (Promega) by digesting the vector and the pooled DNA with EcoRI and BamHI. Sequencing was performed on both strands using the T7 and SP6 primers with the CircumVent Thermal Cycle kit (New England Biolabs).
106
DNA Binding Studies of a DNA Modifying Enzyme Table 3. Binding conditions and gel shift results of in vitro screening Iterative generation 0 1 2 3 4 5
DCMTase concentration (nM)
DNA concentration (nM)
± 68 68 68 5.0 0.50
± 50 25 12.5 0.125 0.030
Percentage DNA shifted ± 1.5 12 25 <1 <1
Maximal population complexity 2.8 1011 4.2 109 5.0 109 1.2 108 1.2 106 1.2 104
Listed are the enzyme and DNA substrate concentrations used in each round of selection. The Cerenkov cpm within the excised gel slice, containing the shifted complex, is shown as a percentage of the total Cerenkov counts loaded onto the gel. This percentage limits the complexity of the DNA pool; therefore, it is used to calculate the maximal population complexity in each successive generation.
Screening for DCMTase binding discrimination with a randomized DNA pool The sampling of several discrete sequences for binding speci®city is laborious and prone to investigative prejudice. In order to understand the thermodynamic stability of DCMTase:DNA interactions in a diverse population of CpG sequence contexts, as might be expected in vivo, we devised an in vitro screening protocol that exploits the gel mobility shift assay. Sequence degeneracy was over a 12-base-pair region on the ¯anking sides of an invariant CpG (Figure 5A). To avoid introduction of multiple CpG dinucleotides and limit the complications of assigning which CpG the DCMTase is registering on when bound to DNA containing multiple CpG dinucleotides, only three bases were randomized on the synthetic oligo: adenine, thymine and guanine. This introduces an asymmetry between the strands as the primer extended complementary strand contains thymine, adenine and cytosine. The guanine-containing strand is referred to throughout this paper with the understanding that double-stranded DNA was used in the screening. The reaction conditions used for each iterative generation of the screening are summarized in Table 3. The ®rst round of screening contained ten times more DNA molecules than the maximal population complexity of 2.8 1011 discrete sequences, it yielded the generation-1 pool of sequences. An increasing fraction of the DNA pools were shifted through the ®rst three generations, during which the enzyme concentration was kept constant and the DNA concentration was decreased. Our initial choice of conditions was suf®cient to stabilize binding of the DNA pool, so the selective pressure to discriminate between sequences was increased in generations 4 and 5 by decreasing both enzyme and DNA concentrations. The maximal population complexity in each generation decreases because only a fraction of the added DNA was shifted. The complexity of the starting population is divided by the percentage of DNA shifted in each generation and ultimately results in no more than 1.2 104 discrete sequences in the generation-5 pool (Table 3).
Individual members from the starting pool and generations 1, 3 and 5 were cloned and sequenced from both strands. Only the guanine-containing strands are shown for simplicity in Figure 5B, but it must be kept in mind that these studies were done using unmethylated double-stranded substrates. Synthesis of the starting population is shown to be randomized at each position with an observed frequency approximating 1/3 each in guanine, adenine and thymine. The selected pools became successively more guanosine-rich with each generation. A total of 49 isolates were cloned and sequenced from the generation-5 pool and none were identical. Nucleotide, dinucleotide and trinucleotide frequencies were analyzed using the Wisconsin Sequence Analysis program COMPOSITION. The selected nucleotides ¯anking the central CpG dinucleotide were 64.7% in guanine, 13.8% in adenine and 21.6% in thymine. The mean frequency of guanine bases per generation-5 isolate was 14.5 out of the 24 selectable positions and more guanines were observed on the 50 -¯ank compared to the 30 -¯ank (P 0.04). The far ¯anking regions are a full helical turn distal to the invariant CpG and are highly enriched in guanine as compared to regions proximal to the CpG. In addition to the abundance of guanosylguanosyl (GpG) dinucleotides, guanosyl-thymidyl (GpT) and thymidyl-guanosyl (TpG) dinucleotides appear often and occur more frequently on the 30 ¯ank (P 0.01). Trinucleotide analyses reinforce the observations at the nucleotide and dinucleotide levels. The highest frequency of GpGpG was at the far 50 -¯ank, while GpTpG and TpGpT trinucleotides were far more abundant in the 30 ¯ank (P 0.01). The discrimination exhibited by DCMTase for generation-5 sequences may re¯ect important structural characteristics that contribute to stabilization of the initial DCMTase:DNA complex. Because each half of the double helix ¯anking the CpG has been shown to have statistically different features, these analyses suggest that an ideal substrate is likewise asymmetric and that there is a particular binding orientation of DCMTase on DNA. The guanine richness at each randomized position for the generation-5 isolates is best shown in Figure 6. The murine DCMTase is a large
107
DNA Binding Studies of a DNA Modifying Enzyme
Figure 6. Nucleotide frequency at each randomized ¯anking position for the generation-5 screening. The bar graph (A) shows the percentage occurrence of each nucleotide at the randomized positions. The predominance of guanosine extends over the entire randomized region. The horizontal line at 33% is representative of the starting pool frequencies. The line at 70% is added as a visual aid. B, The nucleotide percentages at each position.
183,000 Da protein that selected for sequences extending over the entire 12 base-pairs provided for selection on each side of the central CpG. The Wisconsin Sequence Analysis program CONSENSUS was used to construct a common generation-5 sequence with a certainty level of 60%. The sequence GGGGGGGRRKKGCGKGGKGKKGKKGG, where R is guanine or adenine and K is guanine or thymine, was obtained and is shown to highlight the guanine richness and the preference for GpT and TpG on the 30 -side of the CpG. At a certainty level of 80% the plasticity of sequence preferences can be seen close to the invariant CpG; KGGRKKRDDDKRCGKRRDKKKKKKKG (D is guanine, thymine or adenine). We have not tested whether the DCMTase can select for sequences out further than 12 base-pairs or if multiple CpG dinu-
cleotides are preferred over the 26-base-pair expanse. Also, key positioning of cytosine residues dispersed within this strand was not assessed by the limitations imposed. Similar sequences occur frequently in the genome We subjected the 49 generation-5 sequences to FASTA searches of the GenBank library to see if similar sequences exist in the genomes of higher eukaryotes. The search was limited in three ways: (1) only the mouse and human sequences were searched, even though DCMTase activities have been identi®ed in many metazoan organisms. (2) To be considered further, a ``hit'' had to be identical at 22 of the 26 base positions, including the cen-
Figure 7. Genomic sequences similar to the DCMTase selected generation-5 clones. Fasta searches through the mouse and human GenBank libraries produced the following matches when limited to no greater than four mismatches and no gaps. The de®nitions have been edited from the original entries and CpG dinucleotides are underlined.
109
DNA Binding Studies of a DNA Modifying Enzyme
Control experiments eliminate a nonspecific selection
Figure 8. Restriction endonuclease challenge with AciI of the starting population, DCMTase selected generations 1 and 3 and a mock generation-3 pool. Equivalent cpm were added to each lane, odd lanes are unrestricted controls and lanes 2, 4, 6 and 8 are AciI restrictions of the starting pool, DCMTase selected generation-1, DCMTase selected generation-3 and mock generation-3, respectively.
tral CpG. No hits were retrieved that had a higher identity. (3) No gaps in alignment were allowed. Remarkably, 20 ``hits'' were recovered from GenBank that met our severely restricted criteria. Figure 7 shows the alignments of the ®ve hits from mouse and lists the 15 hits from human. A simpli®ed, random genome would be expected have a complexity of 422, or 1.8 1013 base-pairs, in order to contain any of these sequences just once. Of course, this is an oversimpli®cation. But, the results appear to be striking when considering that the mammalian genome is approximately 3 109 basepairs, only about 40% in guanine plus cytosine, and about tenfold de®cient in CpG dinucleotides. The majority of hits are in what may be presumed to be regulatory regions of the genome; 50 or 30 untranslated regions (UTR) or in CpG islands. Many of the associated genes are also of developmental interest. For example, homeo box Hox2.6 and HoxA7 function in early body segmentation. These ®ndings may re¯ect an intrinsic function of DCMTase in developmental programming.
A control series of ampli®cations in the absence of DCMTase were done to show that our iterative PCR conditions were not responsible for the observed guanine selection. Endonuclease challenge of the starting population, DCMTase selected generations 1 and 3 and the mock generation-3 DNA pools was done with AciI (50 -GCGG-30 restriction) to assess randomness. Although this is a limited sampling, the DNA speci®city of AciI can discern the relative abundance of guanosine nucleotides immediately ¯anking the CpG. After endonuclease challenge of 32P-labeled DNA, the products were resolved on a 12% polyacrylamide gel (Figure 8). As predicted from the sequence data presented in Figures 5B and 6, an increase in restricted DNA is observed in going from the starting population to DCMTase-selected generation-1 and generation-3. The same approximate level of restriction is observed with the mock generation-3 and the starting population, thereby eliminating the possibility that the guanine richness observed after selection was due to the ampli®cation process. The band just below the uncut DNA is presumed to be a gel artifact, as in most cases it also appears in the control lanes. Added proof that the selection was dependent on DCMTase was provided by sequencing the entire mock-selected pool in comparison to the DCMTase selected generation-3 pool, similar to that done by Blackwell et al. (1993). An equal abundance of the randomized nucleotides was resolved for the mock-selected pool and a guanine-rich population was resolved for the DCMTase selection (data not shown). Sequence discrimination A better understanding of the DNA sequence contributions to DCMTase binding stabilization
Figure 9. Initial velocity curves of the selected generations. The 50 ml reactions contained 50 nM DCMTase, 7 mM AdoMet and DNA at 4.7, 23, 47 and 230 nM in 100 mM Tris (pH 8.0), 10 mM EDTA, 10 mM DTT, 200 mg/ml BSA. The incubations were for one hour at 37 C and were processed as described in Experimental Procedures. &, generation-1 pool; ~, generation-2 pool; *, generation-4 pool; }, generation-5 pool.
110 may provide an insight into the associations this enzyme has with chromatin in vivo. The DCMTase selected DNA from the iterative generations were compared with each other in binding and catalytic assays. DCMTase binds the pooled generation-5 sequences only twofold more tightly than the starting pool (data not shown). The inherent complexity of each pool makes it dif®cult to assess the true preference for each generation as a whole. The question of sequence speci®city was more accurately addressed by GMSA of the discrete sequences, CRE a/b and GC-box a/b. There we found that the guanine/cytosine-rich GC-box was preferred approximately threefold compared to the more adenine/thymine-rich CRE sequence. Initial velocity plots for the starting population and generations 2, 4 and 5 are shown in Figure 9. The catalytic speci®city for the selected generations increases at each cycle, with little change in and a twofold increase in kcat. DCMTase is KDNA m concluded from these studies to bind and catalyze methylation of CpG dinucleotides more ef®ciently in the context of guanine/cytosine-rich elements than in adenine/thymine-rich ones.
Discussion A monumental objective in the ®eld of eukaryotic DNA methylation is to understand how cellspeci®c, genomic methylation patterns are determined throughout the course of development. Because it is the catalytic agent for cytosine methylation, DCMTase clearly has a central role in both maintaining DNA methylation patterns and in establishing new ``epi-genotypes''. We have been studying how the highly homogeneous and unproteolyzed DCMTase from mouse erythroleukemia cells interacts with DNA, cofactors and other cellular regulators, so that enzyme speci®city and catalytic modulation can be quantitatively addressed. The fundamental issues of binding and catalytic discrimination of the mammalian enzyme for different DNA sequences have been actively debated. Many reports have suggested that the ability of the enzyme to methylate the cognate CpG dinucleotide depends to some degree on ¯anking sequences (Bolden et al., 1986; Bestor et al., 1992; Hepburn et al., 1991; Pfeifer et al., 1985; Ward et al., 1987; Carotti et al., 1986; Smith et al., 1992), while others describe the lack of any ¯anking sequence effects (Yoder et al., 1997; Bestor & Tycko, 1996; Carlson et al., 1992). These studies used partially puri®ed or proteolyzed enzyme, substrates containing multiple CpG sites, or compared relative velocities obtained at a single DNA concentration. For these reasons, an accurate estimation of speci®city (also known as discrimination) was precluded. Similarly, reports regarding the preference of DCMTase for single and double-stranded substrates are also in direct con¯ict with one another (Adams et al., 1986; Smith et al., 1992; Carotti et al.,
DNA Binding Studies of a DNA Modifying Enzyme
1986; Wang et al., 1984; Pfeifer et al., 1985; Gruenbaum et al., 1982; Christman et al., 1995). Our recent steady-state kinetic analysis with unmethylated GC-box and CRE DNA sequences showed compensatory three- to fourfold changes and kcat that resulted in a small discrimiin KDNA m (Flynn et al., 1996). nation at the level of kcat/KDNA m Here, we quantitatively addressed the sequencedependent discrimination of DCMTase at the level . The thermodynamic binding constant, of KDNA D , is a characteristic of the initial enzyme:DNA KDNA D has additional terms accounting complex and KDNA m for the forward reaction rate. DCMTase:DNA interactions were investigated with discrete DNA sequences of biological importance, and with a large divergent pool of DNA sequences. The discrimination between unmethylated single and double-stranded DNA, and unmethylated and hemi-methylated double-stranded DNA was also quanti®ed. DCMTase binding to DNA is stabilized by guanine/cytosine-rich sequences Gel mobility shift assays were used to determine , of the the apparent dissociation constants, KDNA D enzyme for different forms of the GC-box and CRE cis-regulatory elements. Complex, higher-order interactions were observed under conditions of limiting DNA and varying protein concentrations. While the multiple protein:DNA complexes and unusual DNA concentration dependence are shown to involve the DCMTase, accurate quantitative analysis is precluded due to the uncertainty of binding stoichiometry and the relative af®nities of each binding event (Senear & Brenowitz, 1991; Sackett & Saroff, 1996). Whereas many DNA-binding proteins, including DNA adenine-N6 methyltransferases (Reich & Mashhoon, 1990), form a single protein:DNA complex under similar conditions, some bacterial and mammalian DNA cytosine-C5 methyltransferases are known to produce multiple complexes at low DNA concentrations (Dubey & Roberts, 1992; Reale et al., 1995). How the higher-order complexes are assembled is not known. Whether a second DCMTase molecule binds DNA directly or associates only with the ®rst DCMTase bound cannot be assumed. However, the second binding event evidently drives the down approximately 20-fold in apparent KDNA D derived from comparison to the apparent KDNA D excess DNA conditions. In the case of the murine enzyme, the higher-order complexes are catalytically incompetent (Flynn et al., 1996) and we are currently studying this phenomenon so that we may understand it better. Gel mobility shift assays performed with micromolar DNA concentrations and limiting DCMTase result in a single, shifted DNA band. These observations are similar to those described for the bacterial cytosine DNA methyltransferases, M.MspI (Dubey & Roberts, 1992) and M.HhaI (Yang et al., 1995); the determination of equilibrium constants
DNA Binding Studies of a DNA Modifying Enzyme
under these conditions is valid and not unusual (M. T. Record, personal communication). Our enzyme preparation obeyed classical MichaelisMenton kinetics with the same substrates when assayed in the same DNA concentration range values (Flynn et al., 1996). The estimated KDNA D reported in Table 2 are about one-half of those for the same determined at the level of KDNA m unmethylated double-stranded substrates. The lack of large differences between these constants suggests that steps following the initial formation of a speci®c protein:DNA complex do not contrib. The KDNA values for the hemiute largely to KDNA m D methylated substrates determined by pre-steadystate kinetic assays in our previous study were in values determined good agreement with the KDNA D in this study by GMSA under conditions of excess DNA. The catalytically competent DCMTase bound DNA in a 1:1 stoichiometry and had a strong preference for binding double-stranded DNA over single-stranded DNA. Hemi-methylated DNA was bound by the enzyme with slightly higher af®nity than unmethylated double-stranded DNA. The data further supports our hypothesis that KDNA D the preference for hemi-methylated DNA versus unmethylated double-stranded DNA derives almost entirely from changes in the methylation rate constant, kmethylation (Flynn et al., 1996). A recent study of M.HhaI:hemi-methylated DNA and M.HhaI:unmethylated DNA cocrystal structures attempted to rationalize the two- to threefold discrimination manifested by this enzyme at the level of binding (O'Gara et al., 1996). These authors proposed that the binding discrimination derives mostly from a single van der Waals' contact between the Glu239 carboxylate and the methyl group of the 5-methyl-20 -deoxycytidine. While DCMTase also has a glutamate at this position (Glu1388), we suggest that other differences in the assembly of the active site contribute to the quantitatively larger preference of the murine enzyme for hemi-methylated DNA. The two base-pair, CpG, cognate sequence of the mammalian DCMTase is small compared to the cognate sites of most bacterial DNA methyltransferases. DNA footprint analyses of M.SssI, M.HhaI and M.MspI are consistent with protein:DNA interactions extending over 16 base-pairs (Renbaum & Razin, 1995; Dubey & Roberts, 1992). Thus, the large mammalian DCMTase protein (Glickman et al., 1997) most likely involves DNA contacts outside of this minimal sequence. Support for this is provided by our observation that the guanine/ cytosine-rich GC-box element (GGGGCGGGGC) is bound approximately threefold more tightly than the adenine/thymine-rich CRE element (TGACGTCA). An in vitro selection method was designed to de®ne both the span of the protein:DNA interface, and the sequence preference of the enzyme for nucleotides ¯anking the consensus CpG. Previous applications of this strategy were useful in de®ning a consensus sequence for DNA binding proteins
111 involving large differences in binding energetics between random and target sequences (Kinzler & Vogelstein, 1989; Thiesen & Bach, 1990; Blackwell et al., 1993; He et al., 1996). We aimed to extend these selection strategies to identify ¯anking sequence preferences, where binding discrimination is expected to be much less than when searching for a six to ten-base-pair cognate site. One potential outcome would be the lack of any preference, as described for the UBF protein using a similar method (Copenhaver et al., 1994). A consensus sequence larger than the minimal CpG was not likely to result from our selection process, because genomic sequencing of 5-mC reveals that the enzyme methylates many CpG contexts in vivo. Our screening method ef®ciently identi®ed a DCMTase-induced population drift from 33.3% guanosine in the starting randomization to 50.0% in generation-1, 55.3% in generation-3 and ®nally 64.7% in generation-5. Randomized position 12 (see Figure 6) was enriched to 88% guanine in generation-5, suggesting that the total sequence space represented by the starting randomization was severely con®ned. Ultimately, the selection process did not disclose an obvious preferred sequence, but clearly a selection was evident. The ®nding that many CpG sequence contexts stabilize the DCMTase:DNA complex is consistent with the presumption that all of the roughly 3 107 CpG dinucleotides in the murine genome, including those in CpG islands, undergo methylation in vivo at one time or another during development. We could not ®nd a reference directly linking methylation of the DCMTase selected generation-5 sequences and in vivo gene regulation. However, we found that four out of the 20 genetic loci identi®ed as hits in Figure 7 have reports associating methylation with gene regulation. The region of the genome associated with Huntington's disease is imprinted (Petronis, 1996) and the age at onset of the disease correlates to changes in methylation (Reik et al., 1993). Methylation has also been discovered at the CpG island in the trisomic region of chromosome 21 in patients with Down's syndrome and was postulated as an attenuation mechanism for fetal survival (Kuromitsu et al., 1997). Hox genes are methylation regulated (Flagiello et al., 1996) and the psuedoautosomal region of the murine Y chromosome is imprinted (Takahashi et al., 1994). These examples support the validity of our screening protocol and our hypothesis that the binding discrimination of mammalian DNA methyltransferases is in¯uenced by guanine-rich sequences ¯anking a CpG dinucleotide. Sequence analysis of the generation-5 members provided evidence that the DCMTase may bind these substrates in a preferred orientation. A greater guanosine selectivity was associated with the far 50 -side of the CpG and a more divergent region was exposed from the ÿ2 to the ÿ5 positions (numbering as in Figure 6). The 30 -side of the invariant CpG exhibits a different DCMTase preference; GpT and TpG dinucleotides occur more
112 frequently and are often tandemly arranged. Empirically, the data do not allow for prediction of which strand may be poised to be methylated on the double-stranded substrate. The binding asymmetry was likely induced by the design of the starting population, because one strand was guanine-rich while the other was cytosine-rich. This design was chosen in order to avoid introducing multiple CpG dinucleotides that could complicate our assessment of ¯anking sequence contributions around a single CpG. Our results with the mammalian DCMTase, which suggest sequence-dependent binding affects for a 26-base-pair expanse (or more), are quite reasonable given the DNA footprinting results for the bacterial enzymes mentioned. Our quantitative and qualitative determinations of CpG ¯anking region contributions to DCMTase binding and catalysis are in con¯ict with a recent in vitro selection study (Yoder et al., 1997). These authors argue that DCMTase does not have any sequence discrimination. Experimental differences may account for the discrepancy. Yoder et al. used a whole cell extract to do a catalytic screening, from the same cell line used in our studies, which contained very low quantities of DCMTase. Our screen was for binding and the DCMTase preparation was highly homogeneous; therefore, our enzyme was not in competition with a myriad of cellular factors (i.e. transcription factors and nucleosomes) that could speci®cally or non-speci®cally bind certain sequences and effectively take them out of the DCMTase selection. In vitro screening methods, in general, rely on iterative screening procedures for making strong conclusions. Yoder et al. used a catalytic trapping method that did not allow for iterative screening. We increased the selective pressure for binding over ®ve iterative generations in order to draw our conclusions: an increase in binding and catalytic speci®city is achieved with guanine/cytosine-rich sequences ¯anking a CpG dinucleotide. Moreover, their starting pool was severely biased. Our analysis of their starting pool sequences shows that guanine and adenine are over-represented at 41% and 28%, while cytosine and thymine are under-represented at 14% and 15%. DCMTase interactions with DNA are influenced by helical geometries Dinucleotide analyses have been useful for understanding sequence-dependent conformational parameters of DNA (El Hassan & Calladine, 1996; Hunter, 1993; Yanagi et al., 1991). Generally, dinucleotide conformational parameters have a limited range that is dependent on the two nucleotides immediately ¯anking the dinucleotide step in question; however, more distant nucleotides have been shown to have signi®cant effects on CpG helical parameters (Lefebrve et al., 1996). These analyses provide the basis for a qualitative interpretation of DNA conformational features important for the
DNA Binding Studies of a DNA Modifying Enzyme
stabilization of the initial DCMTase:DNA complex. Guanosine-rich stretches, best represented in our studies by the GC-box and the DCMTase selected 50 -regions, often assume an A-DNA conformation (McCall et al., 1985; Yanagi et al., 1991; El Hassan & Calladine, 1996). A-DNA differs from B-DNA in that the minor groove is wide and shallow while the major groove is narrower and deeper. Although little is known about the DCMTase:DNA interface, the enzyme contains the peptide motif 716 SPKK719, which is found in proteins known to interact with the minor groove of DNA (Churchill & Suzuki, 1989). The preference for sequences which have A-DNA-like features may be due to DCMTase:DNA interactions mediated by this motif. The GpT and TpG dinucleotide repeats, observed more frequently in the DCMTase-selected 30 -¯ank, have unique sets of conformational parameters that can increase helical ¯exibility (Nagaich et al., 1994; Beutel & Gold, 1992; Lyubchenko et al., 1993; Haniford & Pulleybank, 1983). Like the TpG step, CpG is considered ``malleable'' because the local conformations are dependent on ¯anking base-pairs (Lefebrve et al., 1995, 1996; Hunter, 1993; Prive et al., 1991; Grzeskowiak et al., 1991). Severe effects on the geometrical parameters associated with a centrally located CpG have been measured for at least 15 different sequences. Structures of oligonucleotides containing the consensus CRE element, TGACGTCA (Mauffret et al., 1992; Konig & Richmond, 1993), and several sequences closely related to the GCbox consensus, GGGGCGGGGC, have been determined. A small twist angle is characteristic of CpG embedded in guanine/cytosine-rich sequences and likely adds to the overall A-DNA character (Haran et al., 1987; Heinemann et al., 1987; Rabinovich et al., 1988; Verdaguer et al., 1991; Frederick et al., 1989; Conner et al., 1984; McCall et al., 1985). Conversely, adenine/thymine-rich ¯anking sequences can lead to negative roll and high twist values at the CpG, so that these helices conform more to B-DNA (Lefebrve et al., 1996; Mauffret et al., 1992; Grzeskowiak et al., 1991; Prive et al., 1991; Bingman et al., 1992). The backbone torsion angles that connect the cytidine and guanosine residues in these structures are particularly interesting. The large slide associated with extensive inter-strand guanine stacking tends to stretch and contort the a and g torsion angles into an unusual BII conformation (Haran et al., 1987; Rabinovich et al., 1988; Lefebrve et al., 1996; El Antri et al., 1993). Mechanically speaking, the BII conformation allows for a crankshaft motion to modulate a destacking of bases (Haran et al., 1987), that may likely be an early event in the base-¯ipping process mediated by DNA methyltransferases (Allan & Reich, 1996). BII may be more readily attained by a CpG with guanine/cytosine-rich ¯anking sequences than with adenine/thymine-rich ones. The functional importance of the CpG phosphate orientation and ¯exibility, and DCMTase:pho-
113
DNA Binding Studies of a DNA Modifying Enzyme
sphate interactions in general, have been studied using the M.HhaI:DNA cocrystal structure (Klimasauskas et al., 1994; Cheng & Blumenthal, 1996). This structure has the target cytosine positioned outside of the helical cylinder covalently trapped by the enzyme. Surprisingly few contacts are made directly with the bases and extensive interactions with the backbone are asymmetrically located around the extrahelical cytosine. Interactions with the two phosphates on the 50 -side of this cytosine appear to be particularly important (50 -2pG3pC4pG5pC6p-30 ) and only phosphates 2 Ê when through 5 show a displacement of several A compared with the uncomplexed DNA. The peptide regions that contact the phosphates are conserved among numerous bacterial cytosine DNA methyltransferases (Cheng & Blumenthal, 1996). For M.HhaI, phosphate 3p is contacted by Arg165 and Ser85, and sequence alignment suggests that Arg1315 and Ser1233 may play analogous roles in the mammalian DCMTase. Also, Arg98, which contacts 5p, and Lys90 which contacts 6p, in M.HhaI have homologous residues in the DCMTase, namely Lys1245 and Arg1237. The interactions between these conserved residues with phosphates ¯anking the central CpG, appear to be modulated by ¯anking sequences. Conclusion We conclude that the murine DCMTase has a DNA binding speci®city that is similar to the catalytic speci®city. The preference of the enzyme for guanine/cytosine-rich sequences may re¯ect a preferred positioning of backbone phosphates within the DCMTase:DNA complex. Whether DCMTase uses this binding discrimination to target certain genomic regions or to preferentially methylate guanine/cytosine-rich DNA in vivo has not been addressed. Human viruses are often guanine/cytosine-rich, and the discrimination we identi®ed may aid in the speci®c deactivation of infected viral DNA. Because the sequence discrimination was not large, other regulatory factors may assist in the methylation process.
Experimental Procedures Materials DCMTase was puri®ed from mouse erythroleukemia cells as described (Xu et al., 1995). S-adenosyl-L[methyl-3H]methionine (75 Ci/mmol, 1 mCi/ml, 1 Ci 37 GBq) was from the Amersham Corporation. Unlabeled AdoMet, purchased from the Sigma Chemical Company, was further puri®ed as described (Reich & Mashhoon, 1990). Routinely, a 125 mM AdoMet stock concentration was prepared at a speci®c activity of 5.8 103 cpm/pmol. DE81 ®lters were purchased from Whatman. All other chemicals and reagents were purchased from the Sigma Chemical Company or Fisher Scienti®c. The polyclonal antibody to DCMTase, pATH 52, was kindly provided by Dr Timothy Bestor (Columbia University).
DNA substrate preparation The preparation, puri®cation, and analysis of six oligonucleotides that mimic the GC-box and the cyclic AMP responsive elements (CRE) have been described (Flynn et al., 1996) (Table 1). The percentage of doublestranded DNA in annealed DNA samples was con®rmed to be greater than 99% by 32P-radiolabeling, polyacrylamide gel separation, subsequent autoradiography and densitometry, using a CCD camera and the SW5000 analysis package from Ultra Violet Products (UVP, San Gabriel, CA). Gel mobility shift assays Gel mobility shift assays (GMSA) were performed with minor revisions to the original procedures (Fried & Crothers, 1981; Garner & Revzin, 1981). All reactions were done in 100 mM Hepes (pH 7.4), 10 mM EDTA, 10 mM DTT, 200 mg/ml BSA, 5% glycerol using the indicated 32P-labeled DNA and DCMTase concentrations, incubated on ice for ®ve minutes and loaded on a 1 TBE (89 mM Tris-HCl (pH 8.3), 89 mM boric acid, 2 mM EDTA), 6% polyacrylamide gel. Electrophoresis was done at 250 V, 9 mA for two hours at 4 C and the dried gel was exposed to ®lm overnight. The reaction conditions for buffer, temperature, incubation time, cofactor addition and gel composition have all been optimized. Only slightly better complex resolution was obtained under the listed conditions compared to a ten minute incubation at 37 C prior to gel loading at room temperature and containing either cofactor S-adenosyl-Lmethionine, product S-adenosyl-L-homocysteine, or the AdoMet analog sinefungin. Hepes reaction buffer at pH 7.4 produced sharper banding than Tris-HCl at pH 8.0. Initial binding assays, with a limiting, and constant DNA concentration, resulted in the formation of multiple bands. Subsequent assays used a limiting and constant enzyme concentration with varying DNA concentrations. Binding isotherm determinations of KDNA D Autoradiogram-derived band intensities corresponding to the mobility-shifted DCMTase:DNA complexes were acquired using the UVP system described above. Background subtractions were from equivalent areas about one centimeter below each mobility shifted complex. The corrected intensities were then ®t to a non-linear binding isotherm and graphed using KaleidaGraph 2.1.2 software (Synergy Software). The intensity of the labeled DNA in the protein:DNA complex at saturation was directly compared with uncomplexed DNA areas in control lanes containing 50%, 100% and 150% molar DNA equivalents of the DCMTase concentration. Screening for DNA binding preferences An in vitro selection approach was used to determine the DNA binding discrimination of DCMTase. A population of DNA molecules, each 66 base-pairs long, were synthesized with a central CpG dinucleotide ¯anked on each side by 12 positions randomized with adenosine, thymidine or cytidine; total complexity equal to 2.8 1011 discrete sequences. Guanosine was not added to the randomization to avoid multiple CpG dinucleotides on a double-stranded DNA. The degenerate DNA had the sequence 50 -GGGAATTCATGGATCC-
114 TAAA(N) 12 CG(N) 12 TTTCAAGCTTGTGAATTCCC-3 0 . The randomized regions are ¯anked by PCR primer regions that contain the restriction sites used for cloning. Primer C had the sequence 50 -GGGAATTCATGGATCCT-30 and Primer D was 50 -GGGAATTCACAAGCTTG-30 . The ®rst generation pool of DNA was made double-stranded by Klenow polymerase extension of primer D. The screening procedure was reiterated ®ve times. DNA substrates from each pooled generation that induced higher thermodynamic stabilities of the DCMTase:DNA complex were separated from lower af®nity DNA by PAGE as described above. The region of the gel containing shifted DNA complexes was excised and ®ve exchanges of 5 ml water over 72 hours shaking on ice was suf®cient to elute greater than 95% of all cpm present in the excised gel slice as determined by Cerenkov counting. The eluted DNA was lyophilized, resuspended in TE (10 mM Tris-HCl (pH 8.0); 1 mM EDTA) and cleaned by one phenol:chloroform (1/1, v/v) and two chloroform extractions followed by ethanol precipitation and resuspension in TE. The selected DNA pools were ampli®ed using 20 rounds of PCR using Deep Vent polymerase (New England Biolabs) and DNA primers C and D. The 66-base-pair DNA was separated from the PCR primers on agarose gels and puri®ed using minor changes to the original procedure (Wieslander, 1979). Identification of preferred DNA substrates Individual members from the selected DNA pools were identi®ed by cleaving the DNA ends with BamHI and EcoRI endonuclease and cloning into pGEM11zf(Promega) using standard protocols. The plasmid DNA from single isolates was prepared and the selected CpG ¯anking sequences were determined using the CircumVent sequencing kit (New England Biolabs). The selected inserts were sequenced from both strands using the bacteriophage T7 and SP6 sequencing primers (Promega). Statistical analyses were performed using several programs in the Wisconsin Sequence Analysis Package (Genetics Computer Group) and Kaliedagraph (Synergy Software). Statistical signi®cance was determined by Student's t-test using Microsoft Excel.
Acknowledgments The authors thank Dr J. Fraser Glickman for assisting in the puri®cation of the enzyme and for insightful discussions about the enzyme. The critical review of the manuscript by Dr Stanely M. Parsons and Dr W. Brent Derry is fully appreciated. This work was supported by NIH grant GM 4 63333 to N.O.R.
References Adams, R. L. P., Gardiner, K., Rinaldi, A., Bryans, M., McGarvey, M. & Burdon, R. H. (1986). Mouse ascites DNA methyltransferase: characteristic of size, proteolytic breakdown and nucleotide recognition. Biochim. Biophys. Acta, 868, 9 ± 16. Allan, B. A. & Reich, N. O. (1996). Targeted base stacking disruption by the EcoRI DNA methyltransferase. Biochemistry, 35, 14757± 62. Bestor, T. H. & Tycko, B. (1996). Creation of methylation patterns. Nature Genet. 12, 363± 367.
DNA Binding Studies of a DNA Modifying Enzyme Bestor, T. H., Gundersen, G., Kolsto, A. B. & Prydz, H. (1992). CpG islands in mammalian gene promoters are inherently resistant to de novo methylation. GATA, 9, 48 ± 53. Beutel, B. A. & Gold, L. (1992). In vitro evolution of intrinsically bent DNA. J. Mol. Biol. 228, 803± 812. Bingman, C. A., Zon, G. & Sundralingam, M. (1992). Crystal and molecular structure of the A-DNA dodecamer d(CCGTACGTACGG). J. Mol. Biol. 227, 738± 756. Blackwell, T. K., Huang, J., Ma, A., Kretzner, L., Alt, W., Eisenman, R. N. & Weintraub, H. (1990). Sequencespeci®c binding by the c-Myc protein. Mol. Cell. Biol. 13, 5216± 5224. Bolden, A. H., Nalin, C. M., Ward, C. A., Poonian, M. S. & Weissbach, A. (1986). Primary DNA sequence determines sites of maintenance and de novo methylation by mammalian DNA methyltransferases. Mol. Cell. Biol. 6, 1135± 1140. Carlson, L., Page, A. W. & Bestor, T. H. (1992). Properties and localization of DNA methyltransferase in preimplantation embryos: implications for genomic imprinting. Genes Dev. 6, 2536± 2541. Carotti, D., Palitti, F., Mastrantonio, S., Rispoli, M., Strom, R., Amato, A., Campagnari, F. & Whitehead, E. P. (1986). Substrate preferences of the human placental DNA methyltransferase investigated with synthetic polydeoxynucleotides. Biochim. Biophys. Acta, 866, 135± 143. Cheng, X. & Blumenthal, R. M. (1996). Finding a basis for ¯ipping bases. Structure, 4, 639± 645. Christman, J. K., Sheikhnejad, G., Marasco, C. J. & Sufrin, J. R. (1995). 5-Methyl-20 -deoxycytidine in single-stranded DNA can act in cis to signal de novo DNA methylation. Proc. Natl Acad. Sci. USA, 92, 7347± 7351. Churchill, M. & Suzuki, M. (1989). ``SPKK'' motifs prefer to bind to DNA at A/T-rich sites. EMBO J. 8, 4189± 4195. Conner, B. N., Yoon, C., Dickerson, J. L. & Dickerson, R. E. (1984). Helix geometry and hydration in an A-DNA tetramer: CCGG. J. Mol. Biol. 174, 663± 695. Copenhaver, G. P., Putnam, C. D., Denton, M. L. & Pikaard, C. S. (1994). The RNA polymerase I transcription factor UBF is a sequence-tolerant HMGbox protein that can recognize structured nucleic acids. Nucl. Acids Res. 22, 2651± 2657. Dubey, A. K. & Roberts, R. J. (1992). Sequence-speci®c DNA binding by the MspI DNA methyltransferase. Nucl. Acids Res. 20, 3167 ±3173. El Antri, S., Mauffret, O., Monnot, M., Lescot, E., Convert, O. & Fermanjian, S. (1993). Structural deviations at CpG provide a plausible explanation for the high frequency of mutation at this site. J. Mol. Biol. 230, 373± 378. El Hassan, M. A. & Calladine, C. R. (1996). Propellertwisting of base-pairs and the conformational mobility of dinucleotide steps in DNA. J. Mol. Biol. 259, 95 ±103. Flagiello, D., Poupon, M. F., Cillo, C., Dutrillaux, B. & Malfoy, B. (1996). Relationship between DNA methylation and gene expression of the HOXB gene cluster in small cell lung cancers. FEBS Letters, 380, 103± 107. Flynn, J., Glickman, J. F. & Reich, N. O. (1996). Murine DNA cytosine-C5 methyltransferase: pre-steadyand steady-state kinetic analyses with regulatory DNA sequences. Biochemistry, 35, 7308± 7315.
DNA Binding Studies of a DNA Modifying Enzyme Frederick, C. A., Quigley, G. J., Teng, M., Coll, M., Van der Marel, G. A., Van Boom, J. H., Rich, A. & Wang, H. J. (1989). Molecular structure of an ADNA decamer d(ACCGGCCGGT). Eur. J. Biochem. 181, 295± 307. Fried, M. & Crothers, D. M. (1981). Equilibria and kinetics of lac repressor-operator interactions by polyacrylamide gel electrophoresis. Nucl. Acids Res. 9, 6505± 6525. Garner, M. M. & Revzin, A. (1981). A gel electrophoresis method for quantifying the binding proteins to speci®c DNA regions: applications to components of the Escherichia coli lactose operon regulatory system. Nucl. Acids Res. 13, 3047± 3060. Glickman, J. F., Pavlovich, J. G. & Reich, N. O. (1997). Peptide mapping of the murine DNA methyltransferase reveals a major phosphorylation site and the start of translation. J. Biol. Chem. 272, 17851± 17857. Gruenbaum, Y., Cedar, H. & Razin, A. (1982). Substrate and sequence speci®city of a eukaryotic DNA methylase. Nature, 295, 620± 622. Grzeskowiak, K., Yanagi, K., Prive, G. G. & Dickerson, R. E. (1991). The structure of B-helical CGATCGATCG and comparison with CCAACGTTGG. J. Biol. Chem. 266, 8861± 8883. Haniford, D. B. & Pulleybank, D. E. (1983). Facile transition of poly[d(TG):d(CA)] into a left-handed helix in physiological conditions. Nature, 302, 632± 634. Haran, T. E., Shakked, Z., Wang, A. H.-J. & Rich, A. (1987). The crystal structure of d(CCCCGGGG): A new A-form variant with an extended backbone conformation. J. Biomol. Struct. Dynam. 5, 199± 217. He, Y., Stockley, P. G. & Gold, L. (1996). In vitro evolution of the DNA binding sites of Escherichia coli methionine repressor, MetJ. J. Mol. Biol. 255, 55 ±66. Heinemann, U., Lauble, H., Frank, R. & Blocker, H. (1987). Crystal structure analysis of an A-DNA fragÊ resolution: d(GCCCGGGC). Nucl. ment at 1.8 A Acids Res. 15, 9531± 9549. Hepburn, P. A., Margison, G. P. & Tisdale, M. J. (1991). Enzymatic methylation of cytosine in DNA is prevented by adjacent O6-methylguanine residues. J. Biol. Chem. 266, 7985± 7987. Hunter, C. A. (1993). Sequence-dependent DNA structure. The role of base stacking interactions. J. Mol. Biol. 230, 1025± 1054. Iguchi-Ariga, S. & Schaffner, W. (1989). CpG methylation of the cAMP-responsive enhancer/promoter sequence TGACGTCA abolishes speci®c factor binding as well as transcriptional activation. Genes Dev. 3, 612± 619. Jane, S. M., Gumuchio, D. L., Ney, P. A., Cunningham, J. M. & Nienhuis, A. W. (1993). Methylationenhanced binding of Sp1 to the stage selector element of the gamma-globin gene promoter may regulate developmental speci®city of expression. Mol. Cell. Biol. 13, 3272± 3281. Joel, P., Shao, W. & Pratt, K. (1993). A nuclear protein with enhanced binding to methylated Sp1 sites in the AIDS virus promoter. Nucl. Acids Res. 21, 5786± 5793. Jones, P. A. (1985). Altering gene expression with 5azacytidine. Cell, 40, 485± 486. Jost, J. P. & Saluz, H. P. (1993). DNA Methylation: Molecular Biology and Biological Signi®cance, Birkhauser Verlag, Basil. Kinzler, K. W. & Vogelstein, B. (1989). Whole genome PCR: application to the identi®cation of sequences
115 bound by regulatory proteins. Nucl. Acids Res. 17, 3645± 3653. Klimasauskas, S., Kumar, S., Roberts, R. J. & Cheng, X. (1994). HhaI methyltransferase ¯ips its target base out of the DNA helix. Cell, 76, 357± 369. Konig, P. & Richmond, T. J. (1993). The X-ray structure of the GCN4-bZIP bound to ATF/CREB site shows the complex depends on DNA ¯exibility. J. Mol. Biol. 233, 139± 154. Kuromitsu, J., Yamashita, H., Kataoka, H., Tatahara, T., Muramatsu, M., Sekine, T., Okamoto, N., Furuichi, Y. & Hayashizaki, Y. (1997). A unique downregulation of h2-calponin gene expression in Down syndrome: a possible attenuation mechanism for fetal survival by methylation at the CpG island in the trisomic chromosome 21. Mol. Cell. Biol. 17, 707± 712. Lei, H., Oh, S. P., Okano, M., Juttermann, R., Goss, K. A., Jaenisch, R. & Li, E. (1996). De novo DNA cytosine methyltransferase activities in mouse embryonic stem cells. Development, 122, 3195± 3205. Lefebvre, A., Mauffet, O., Hartmann, B., Lescot, E. & Fermandjian, S. (1995). Structural behavior of the CpG step in two related oligonucleotides re¯ects its malleability in solution. Biochemistry, 34, 12019± 12028. Lefebvre, A., Mauffret, O., Lescot, E., Hartmann, B. & Fermandjian, S. (1996). Solution structure of the CpG containing d(CTTCGAAG)2 oligonucleotide: NMR data and energy calculations are compatible with a BI/BII equilibrium at CpG. Biochemistry, 35, 12560± 12569. Li, E., Bestor, T. H. & Jaenisch, R. (1992). Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell, 69, 915±926. Lyubchenko, Y. L., Shlyakhtenko, L. S., Apella, E. & Harrington, R. E. (1993). CA runs increase DNA ¯exibility in the complex of l Cro Protein with the OR3 site. Biochemistry, 32, 4121± 4127. Mauffret, O., Hartmann, B., Convert, O., Lavery, R. & Framandjian, S. (1992). The ®ne structure of two dodecamers containing the cAMP responsive element sequence and its inverse. J. Mol. Biol. 227, 852± 875. McCall, M., Brown, T. & Kennard, O. (1985). The crystal structure of d(G-G-G-G-C-C-C-C). A model for poly(dG):poly(dC). J. Mol. Biol. 183, 385± 396. Moens, U., Subramanian, N., Johansen, B. & Aabakke, J. (1993). The c-fos cAMP-responsive element: regulation of gene expression by a b2-adrenergic agonist, serum and DNA methylation. Biochim. Biophys. Acta, 1173, 63± 70. Nagaich, A. K., Bhattacharyya, D., Brahmachari, S. K. & Bansal, M. (1994). CA/TG sequence at the 50 end of oligo(A)-tracts strongly modulates DNA curvature. J. Biol. Chem. 269, 7824±7833. O'Gara, M., Roberts, R. J. & Cheng, X. (1996). A structural basis for the preferential binding of hemimethylated DNA by HhaI DNA methyltransferase. J. Mol. Biol. 263, 597± 606. Petronis, A. (1996). Genomic imprinting in unstable DNA diseases. Bioessays, 18, 587± 590. Pfeifer, G. P., Spiess, E., Grunwald, S., Boehm, T. L. J. & Drahovsky, D. (1985). Mouse DNA-cytosine-5-methyltransferase: sequence speci®city of the methylation reaction and electron microscopy of enzymeDNA complexes. EMBO J. 4, 2879± 2884. Pradhan, S., Talbot, D., Sha, M., Benner, J., Hornstra, L., Li, E., Jaenisch, R. & Roberts, R. J. (1997). Baculovirus-mediated expression and characteriz-
116
DNA Binding Studies of a DNA Modifying Enzyme
ation of the full-length murine DNA methyltransferase. Nucl. Acids Res. 25, 4666± 4673. Prive, G. G., Yanagi, K. & Dickerson, R. E. (1991). Structure of the B-DNA decamer C-C-A-A-C-G-T-TG-G and comparison with isomorphous decamers C-C-A-A-G-A-T-T-G-G and C-C-A-G-G-C-T-G-G. J. Mol. Biol. 217, 177±199. Rabinovich, D., Haran, T., Eisenstein, M. & Shakked, Z. (1988). Structures of the mismatched duplex d(G-GG-T-G-C-C-C) and one of its Watson-Crick analogues d(G-G-G-C-G-C-C-C). J. Mol. Biol. 200, 151± 161. Ramachandani, S., MacLeod, A. R., Pinard, M., von Hofe, E. & Szyf, M. (1997). Inhibition of tumorigenesis by a cytosine-DNA, methyltransferase, antisense oligodeoxynucleotide. Proc. Natl Acad. Sci. USA, 94, 684± 689. Reale, A., Lindsay, H., Saluz, H. P., Pradhan, S., Adams, R. L. P., Jost, J. P. & Strom, R. (1995). DNA binding and methyl transfer catalyzed by mouse DNA methyltransferase. Biochem. J. 312, 855± 861. Reich, N. O. & Mashhoon, N. (1990). Inhibition of EcoRI DNA methylase with cofactor analogs. Biochemistry, 265, 8966± 8970. Reik, W., Maher, E. R., Morrison, P. J., Harding, A. E. & Simpson, S. A. (1993). Age at onset in Huntington's disease and methylation at D4S95. J. Med. Gen. 30, 185± 188. Renbaum, P. & Razin, A. (1995). Footprint analysis of M.SssI and M.HhaI methyltransferases reveals extensive interactions with the substrate DNA backbone. J. Mol. Biol. 248, 19 ± 26. Rouleau, J., Tanigawa, G. & Szyf, M. (1992). The mouse DNA methytransferase 50 -region. A unique housekeeping gene promoter. J. Biol. Chem. 267, 7368± 7377. Sackett, D. L. & Saroff, H. A. (1996). The multiple origins of cooperativity in binding to multi-site lattices. FEBS Letters, 397, 1 ± 6. Santi, D. V., Garrett, C. E. & Barr, P. J. (1983). On the mechanism of inhibition of DNA cytosine methyltransferases by cytosine analogs. Cell, 33, 9 ± 10. Senear, D. F. & Brenowitz, M. (1991). Determination of binding constants for cooperative site-speci®c protein-DNA interactions using the gel mobility shift assay. J. Biol. Chem. 266, 13661± 13671. Smith, S. S., Kaplan, B. E., Sowers, L. C. & Newman, E. M. (1992). Mechanism of human methyl-directed DNA methyltransferase and the ®delity of cytosine methylation. Proc. Natl Acad. Sci. USA, 89, 4744± 4748.
Takahashi, Y., Mitani, K., Kuwabara, K., Hayashi, T., Niwa, M., Miyashita, N., Moriwaki, K. & Kominami, R. (1994). Methylation imprinting was observed of mouse mo-2 macrosatellite on the pseudoautosomal region but not on chromosome 9. Chromosoma, 103, 450± 458. Thiesen, H. & Bach, C. (1990). Target detection assay (TDA): a versatile procedure to determine DNA binding sites as demonstrated on SP1 protein. Nucl. Acids Res. 18, 3203± 3209. Tucker, K. L., Talbot, D., Lee, M. A., Leonhardt, H. & Jaenisch, R. (1996). Complementation of methylation de®ciency in embryonic stem cells by a DNA methyltransferase minigene. Proc. Natl Acad. Sci. USA, 93, 12920± 12925. Verdaguer, N., Aymami, J., Fernandez-Forner, D., Fita, I., Coll, M., Huynh-Dinh, T., Igolen, J. & Subirana, J. A. (1991). Molecular structure of a complete turn of A-DNA. J. Mol. Biol. 221, 623± 635. Wang, R. Y. H., Huang, L. H. & Ehrlich, M. (1984). Human placental DNA methyltransferase: DNA substrate and DNA binding speci®city. Nucl. Acids Res. 12, 3473± 3490. Ward, C., Bolden, A., Nalin, C. M. & Weissbach, A. (1987). In vitro methylation of the 50 -¯anking regions of the mouse b-globin gene. J. Biol. Chem. 262, 11057± 11063. Wieslander, L. (1979). A simple method to recover intact high molecular weight RNA and DNA after electrophoretic separation in low gelling temperature agarose gels. Anal. Biochem. 98, 305±3. Xu, G., Flynn, J., Glickman, J. F. & Reich, N. O. (1995). Puri®cation and stabilization of mouse DNA methyltransferase. Biochem. Biophys. Res. Commun. 207, 544± 551. Yanagi, K., Prive, G. G. & Dickerson, R. E. (1991). Analysis of local helix geometry in three B-DNA decamers and eight dodecamers. J. Mol. Biol. 217, 201± 214. Yang, A. S., Shen, J., Zingg, J., Mi, S. & Jones, P. A. (1995). HhaI and HpaII DNA methyltransferases bind DNA mismatches, methylate uracil and block DNA repair. Nucl. Acids Res. 23, 180± 1387. Yoder, J. A., Yen, R. C., Vertino, P. M., Bestor, T. H. & Baylin, S. B. (1996). New 50 regions of the murine and human genes for DNA (cytosine-5) methyltransferase. J. Biol. Chem. 271, 31092 ±31097. Yoder, J. A., Soman, N. S., Verdine, G. L. & Bestor, T. H. (1997). DNA (cytosine-5)-methyltransferases in mouse cells and tissues. Studies with a mechanism based probe. J. Mol. Biol. 270, 385± 395.
Edited by A. Klug (Received 3 July 1997; received in revised form 17 February 1998; accepted 25 February 1998)