Profound Flanking Sequence Preference of Dnmt3a and Dnmt3b Mammalian DNA Methyltransferases Shape the Human Epigenome

Profound Flanking Sequence Preference of Dnmt3a and Dnmt3b Mammalian DNA Methyltransferases Shape the Human Epigenome

doi:10.1016/j.jmb.2005.02.044 J. Mol. Biol. (2005) 348, 1103–1112 Profound Flanking Sequence Preference of Dnmt3a and Dnmt3b Mammalian DNA Methyltra...

340KB Sizes 0 Downloads 28 Views

doi:10.1016/j.jmb.2005.02.044

J. Mol. Biol. (2005) 348, 1103–1112

Profound Flanking Sequence Preference of Dnmt3a and Dnmt3b Mammalian DNA Methyltransferases Shape the Human Epigenome Vikas Handa1 and Albert Jeltsch2* 1

Institut fu¨r Biochemie, FB 08 Heinrich-Buff-Ring 58 Justus-Liebig-Universita¨t Giessen, 35392 Giessen Germany 2

International University Bremen, School of Engineering and Science, Campus Ring 1 28759 Bremen, Germany

Mammalian DNA methyltransferases methylate cytosine residues within CG dinucleotides. By statistical analysis of published data of the Human Epigenome Project we have determined flanking sequences of up to Gfour base-pairs surrounding the central CG site that are characteristic of high (5 0 -CTTGCGCAAG-3 0 ) and low (5 0 -TGTTCGGTGG-3 0 ) levels of methylation in human genomic DNA. We have investigated the influence of flanking sequence on the catalytic activity of the Dnmt3a and Dnmt3b de novo DNA methyltransferases using a set of synthetic oligonucleotide substrates that covers all possible G1 flanks in quantitative terms. Methylation kinetics experiments revealed a O13-fold difference between the preferred (RCGY) and disfavored G1 flanking base-pairs (YCGR). In addition, AT-rich flanks are preferred over GC-rich ones. These experimental preferences coincide with the genomic methylation patterns. Therefore, we have expanded our experimental analysis and found a O500-fold difference in the methylation rates of the consensus sequences for high and low levels of methylation in the genome. This result demonstrates a very pronounced flanking sequence preference of Dnmt3a and Dnmt3b. It suggests that the methylation pattern of human DNA is due, in part, to the flanking sequence preferences of the de novo DNA MTases and that flanking sequence preferences could be involved in the origin of CG islands. Furthermore, similar flanking sequence preferences have been found for the stimulation of the immune system by unmethylated CGs, suggesting a co-evolution of DNA MTases and the immune system. q 2005 Elsevier Ltd. All rights reserved.

*Corresponding author

Keywords: DNA methylation; CpG islands; flanking sequence preferences; epigenome; Dnmt3a

Introduction The cytosine-5 methylation in mammals is an epigenetic modification that plays an important role in embryonic development, gene imprinting, Xchromosome inactivation, regulation of chromatin structure, silencing of transposons and endogenous retroviruses, cancer biology and genetic diseases.1–6 In mammals, cytosine methylation takes place predominantly at palindromic CG dinucleotides in both strands of the DNA. The mammalian genomes contain w60 million CG dinucleotides and 70–80% of those are modified in a non-random pattern. The Abbreviations used: Dnmt1, DNA methyltransferase 1; MTases, methyltransferases. E-mail address of the corresponding author: [email protected]

methylation pattern is inherited by daughter cell genomes during DNA replication by the action of DNA methyltransferase 1 (Dnmt1), which exhibits high preference for a hemimethylated DNA substrate.7–10 The genomic methylation pattern is set by de novo DNA methylation during gametogenesis in a sexspecific fashion and later, after extensive demethylation of the genome, during embryogenesis.5,11 The de novo methylation is carried out by two de novo DNA methyltransferases (MTases), Dnmt3a and Dnmt3b, which methylate unmethylated and hemimethylated DNA.12,13 The role of Dnmt3a and Dnmt3b in stage-specific de novo methylation of mammalian genomes correlates with their high expression in embryonic stem cells, early embryos and developing germ cells.12–15 The de novo methylation activity of Dnmt3b is associated with methylation of pericentromeric satellite regions.16–18

0022-2836/$ - see front matter q 2005 Elsevier Ltd. All rights reserved.

1104 Dnmt3bK/K knockout mice die during late embryonic stage and the embryos lack methylation in the pericentromeric repeat region.17 ICF (a genetic disorder resulting from mutations in Dnmt3b) patients have low methylation in the pericentromeric satellite region of chromosome 1, 9 and 16, leading to chromosome instability.19 Dnmt3a knockout mice show developmental abnormalities and die a few weeks after birth.17 The enzyme has been associated with the methylation of single copy genes and retrotransposons20–22 and it is critical to the establishment of the genomic imprint during germ cell development.23 In addition to their role in de novo methylation, Dnmt3a and Dnmt3b are involved in maintenance of DNA methylation at later stages, as they compensate for a lapse during conversion of hemimethylated DNA to the fully methylated state by Dnmt1.24,25 This is evident from the finding that Dnmt3aK/K/Dnmt3bK/K knockout embryonic stem cells lose genomic methylation gradually, although Dnmt1 is functional, but methylation can be regained by episomal expression of the de novo MTases.25 In this manner a delicate balance between de novo methylation and loss of methylation due to imperfect fidelity of Dnmt1 results in maintenance of genomic methylation levels. Unlike restriction modification system enzymes and transcription factors, mammalian cytosine-5 DNA MTases have a short recognition sequence, CG, consisting of only 2 bases. There are interesting findings on various aspects of the DNA substrate sequence specificity of DNA MTases. Dnmt1 has been found to have several-fold higher preference for hemimethylated CG when compared to unmethylated substrate.7–10 There is no preference for flanking sequences reported for Dnmt1, although highly GC-rich flanking sequences have been found to bind to the enzyme with higher affinity.8 In mammalian genomes some non-CG cytosine residues have also been found to be methylated. This observation is explained by the finding that Dnmt3a methylates non-canonical sites also with a decreasing order of efficiency for CA, CT and CC dinucleotides.13,26,27 Although one of the two Dnmt3 enzymes, Dnmt3b, is processive in nature there is no influence of CG density on methylation activity.25 Flanking sequence preferences of Dnmt3a were first detected by Lin et al.,28 who found a strong preference for a CG site flanked by pyrimidine bases and a loose consensus sequence of YNCGY.28 No such data are available for Dnmt3b. The results for Dnmt3a were based on in vitro methylation experiments in which plasmid DNA was methylated by Dnmt3a followed by bisulfite sequencing analysis. However, the influence of the flanking sequence on the rate of DNA methylation could not be quantified in that study. In addition, the number of different CG sites studied was too small to draw definite qualitative and quantitative conclusions on the influence of flanking sequences on Dnmt3a. If one assumes an influence of only up to three bases

Flanking Sequence Preferences of Dnmt3 Enzymes

upstream and downstream of the central CG, there are 4096 different flanks. Since one has to expect that the effect of each base at each position will depend on the nature of all other bases, there are only two ways to obtain statistically reliable information on flanking sequence preferences: (i) a large statistical survey must be performed in order to integrate the effects over many different flanking sequences; or (ii) synthetic substrates must be used in which one or a few bases are changed while keeping the remaining parts of the flank constant. To address the influence of flanking sequence on DNA methylation we have used two different approaches in combination. We analyzed the methylation pattern in human epigenomic data in the context of different flanking sequences around methylated CG dinucleotides. Here, 390 methylated CG sites were analyzed, allowing us to draw statistically relevant conclusions for longer flanks. In addition, biochemical experiments were performed using oligonucleotide substrates for methylation kinetics under single turnover conditions using different de novo MTases. Surprisingly, we found strong correlation of the flanking sequence preferences of Dnmt3a and Dnmt3b and the average methylation level of CG sites in the human genome.

Results Human epigenomic data analysis First, high-throughput data on the methylation pattern of the human genome were collected in the human epigenome pilot project mostly for CpG islands, promoters and coding regions of genes and the results published recently.29–32 Using the Epigenome WEB site†, we analyzed the methylation levels at various CG sites in the context of their respective flanking sequences, looking for some regular pattern in the sequences in relation to methylation levels. A set of 390 methylated CG sites spanning a 220 kb region of human chromosome 6 were analyzed with respect to the mean methylation levels determined from various tissue samples for each site (Figure 1). The data set consisted of a heterogeneous population of CGs with methylation levels varying from zero to 100%. The 220 kb sequence belongs to the major histocompatibility gene locus and was found to have high gene density containing 23 genes, 11 CpG islands and a region comprising two Alu and one L1 repeat sequences. We were interested in the 8 bp flanking the central CG sites, four in the upstream and four in the downstream direction. The flanking sequences were arranged in the order of average methylation levels of corresponding CG sites. Using these data we wanted to look for consensus in the flanking sequences of those CGs that have † http://www.sanger.ac.uk/perl/MVP/mvp

1105

Flanking Sequence Preferences of Dnmt3 Enzymes

Table 1. Consensus sequences of bases flanking CG sites at positions G1, 2, 3 and 4 for high and low methylation sites in human epigenome data Percentile

Figure 1. Methylation levels at various CG sites in the epigenomic data. The CG sites were arranged in the order of increasing methylation levels.

low or high methylation levels on average. The CG flanking sequences associated with various methylation levels were arranged in the order of corresponding average methylation level. In order to analyze the flanking sequences, the sequences were assigned into groups of high methylation and low methylation CG flanks categories. The high and low methylation groups were assigned using percentile cut-off values both at high and low methylation ends of the arranged sequences. To rule out any bias introduced by the definition of the cut-off values, five different groups of high and low methylation sites were defined on the basis of five different percentiles (8, 12, 16, 32, and 36 percentiles of low methylation levels) and (8, 12, 16, 24 and 32 percentiles of high methylation levels). To find any bias in occurrence of any base at a particular position in the G4 bp flanking sequence of the central CG dinucleotide in the different ten subsets (five subsets of high methylation sites and five subsets of low methylation sites), ratios of frequencies of every base at each position of the subset and in the universal data set were calculated separately (see Table 1 and Supplementary Data). To check the significance of deviation in base frequency, Monte-Carlo simulations were used to generate 42 sets of random sequences (each corresponding to 16 percentile subsets of high and low methylation categories) with the original base composition of the data. On the basis of this distribution the probabilities of obtaining deviations of similar or larger than that seen in the two data sets were calculated to be 0.06% for low methylation and 9.5% for high methylation subsets. Therefore, the distribution of sequences associated with a low methylation level differs strongly from what one would expect by chance, but also the high methylation sites show a significant bias. The chance of obtaining a distribution that has the observed bias at both ends (high and low methylation) at the same time is given by the product of both individual numbers and is below 6!10K5. The significant P values for both the consensus sequences ruled out the possibility of

Sequence

A. High methylation consensus sequence 8 CCTCCGCAAG 12 CCTGCGCAAG 16 CTTGCGCAAC 24 MTGGCGCATC 32 CTKACGCAAS Final consensus CTTGCGCAAG B. Low methylation consensus sequence 8 TGTCCGGTGG 12 TGGCCGGTGG 16 TGGSCGGTGG 32 TGTGCGGTGS 36 TGTYCGGTGC Final consensus TGTTCGGTGG

(P value: 0.095)

(P value: 0.0006)

Each set of five consensus sequences exhibits high similarity resulting in the final consensus sequence for high and low methylation categories. The first column stands for the percentile of data analyses and the second column has corresponding consensus sequences. The P values for biased distribution when compared against random data sets was calculated for 16 percentile subsets.

random fluctuations as a cause of distinct consensus sequences for high and low methylation categories. The bases occurring most frequently at each position of the flank were collected to define a consensus sequence for each subset. We found that the five consensus sequences obtained were very similar within each class of the high and low methylation category (Table 1). Based on sequences of the five sets of both the classes, final consensus sequences for each class were determined to be 5 0 CTTGCGCAAG-3 0 for sites that show a high level of methylation and 5 0 -TGTTCGGTGG-3 0 for low level sites. The distinct consensus sequences of high and low methylation level CG sites indicate an influence of flanking bases on the probability of a CG to be methylated. Investigation of effect of bases at the G1 position by methylation kinetics assay This strong consensus in flanking sequences associated with high and low levels of methylation was an unexpected and interesting piece of information. Since the methylation pattern is set up by de novo MTases, we hypothesized that the flanking sequence effects might reflect the target site preferences of the Dnmt3a and Dnmt3b de novo DNA MTases. To check the flanking sequence effect experimentally and to quantify the effects, we used ten oligonucleotides containing six asymmetric and four palindromic combinations of G1 nucleotide. Thereby, the ten substrates covered all 16 possible permutations at the G1 position flanking the CG site (5 0 -NCGN-3 0 ). Initial experiments were carried out in the sequence context of the most preferred bases at positions G2, 3 and 4 (5 0 -CTTNCGNAAG-3 0 ). The methylation kinetics were performed under single turnover conditions

1106

Flanking Sequence Preferences of Dnmt3 Enzymes

Table 2. Compilation of the experimental results of the influence of the G1 flanks on the activity of Dnmt3a and comparison with the results of the statistical analysis of human epigenome data (see Table 1) Experimental results for activity of Dnmt3a Good substrates Bad substrates ACGC/GCGT ACGT GCGC Consensus RCGY

TCGG/CCGA GCGG/CCGC CCGG TCGA YCGR

Results from the statistical analysis of human epigenome data High methylation level Low methylation level CTTGCGCAAG

Figure 2. Initial velocity of Dnmt3a measured with DNA substrates containing the CG site flanked by an exhaustive set of permutations at the G1 flanking position. The bar chart shows the uniform range of activity with the lowest activity being 7.5% of the highest activity.

to reflect the chemical step of the reaction and not the rate of product release in vitro that most likely is not of relevance in vivo. We found large variations in the initial velocity for methylation of different DNA substrates by Dnmt3a (Figure 2). There was a more than 13-fold difference between the highest and lowest activity, which is a rather big effect when considering that just one base-pair of the substrate DNA outside of the central recognition sequence differs between all the substrates. The oligonucleotide with ACGC/GCGT sequence was found to be the most preferred substrate, followed by palindromic ACGT. The enzyme activity was found lowest with TCGG/CCGA, followed by GCGG/ CCGC and palindromic CCGG, respectively. Careful observation revealed a pattern on the basis of which the sequences could be grouped into three classes with high, intermediate and low preference. This classification displayed an ordered trend that purine bases were preferred at the 5 0 end and pyrimidine bases were preferred at the 3 0 end. The opposite order resulted in low activity and purine bases/pyrimidine bases at both ends had intermediate activity. In addition, CG base-pairs in the flanks tend to decrease the turnover rate of the enzyme. The preference for pyrimidine bases at the C1 position is in agreement with the reported data obtained with a different experimental approach.28 These favored and disfavored sequences are in good agreement with the results of the statistical analysis of epigenomic data (Table 2), suggesting that the flanking site preferences of de novo DNA MTases might be causal for the observed genomic methylation profile. Flanking sequence preference of the catalytic domains of Dnmt3a and Dnmt3b The Dnmt3a and Dnmt3b MTases have two

TGTTCGGTGG

distinct domains, an N-terminal regulatory domain and a C-terminal catalytic domain. In order to investigate the role of the N-terminal regulatory domain in flanking sequence preference, we used functionally active C-terminal catalytic domains of Dnmt3a and Dnmt3b to check the sequence preference. We used the four palindromic sequences and found activity of both the catalytic domains in accordance with the results of the fulllength Dnmt3a enzyme (Figure 3). The similarity of the results obtained with Dnmt3a and Dnmt3b prompted us to investigate the sequence preference of the bacterial M.SssI MTase as well, which also methylates CG sites. The results showed much smaller differences between different substrates (onefold at most) (A. Kiss, M. Roth, A.J. et al.,

Figure 3. Comparison of sequence preference of Dnmt3a and catalytic domains of Dnmt3a and Dnmt3b. The DNA substrates have four palindromic permutations of the G1 flanks. The enzyme activity with ACGT sequence was normalized with data of the Dnmt3a enzyme. The two catalytic domains have similar preferences for flanking sequences, which is also comparable to the sequence preference of full-length Dnmt3a.

1107

Flanking Sequence Preferences of Dnmt3 Enzymes

unpublished results). Furthermore, the ACGC and TCGG substrates, which are the extreme cases with Dnmt3a, are modified at the same rate by M.SssI. This observation confirms that the similar flanking sequence preferences of the catalytic domains of Dnmt3a, CD-Dnmt3b and full-length Dnmt3a are not due to an experimental artifact. We conclude that the N-terminal domain of Dnmt3a does not play an important role in the flanking sequence preference as already suggested by the finding that the isolated catalytic domains are of comparable activity as the full-length enzymes.33,34 In addition our results demonstrate that both the Dnmt3 enzymes share similar flanking sequence preferences, which may be explained by the high amino acid sequence similarity of the C-terminal regions shared by the two de novo enzymes. Influence of the G2, 3 and 4 position bases The O13-fold difference in activity of Dnmt3a after variation of just 1 bp in front of and following the CG site is a very interesting observation, as it clearly shows a pronounced influence of flanking bases on enzyme catalysis. However, so far all our experiments were performed in the context of the high-methylation flanking sequences at the G2, 3 and 4 positions. Next, we wanted to investigate if the bases at the G2, 3 and 4 positions have any influence on the enzyme activity. An exhaustive experimental evaluation of the influence of farther flanks would require a very large number of substrate sequences. In lieu of this practically nearly impossible approach, we designed an experiment based on the information obtained from the epigenome data analysis. We designed two oligonucleotides with ACGC/GCGT (most preferred) and TCGG/CCGA (least preferred) sequences flanked by least preferred sequence at position G2, 3 and 4 (5 0 -TGTNCGNTGG-3 0 ) as determined from epigenomic data analysis results. We found that there was a nearly fourfold drop in activity for the ACGC/GCGT site in the context of the unfavorable outer flanking sequence. Furthermore, the enzyme activity at the TCGG/CCGA sequence approached zero when flanking sequences were changed from most preferred to least preferred at the G2, 3 and 4 positions (Figure 4). When comparing best and worst overall flanks, we observed a very wide range of enzyme activities. When taking into account the detection limit in our experiment we conclude there is O500-fold difference in the rates of methylation at 5 0 -CTTACGCAAG-3 0 versus 5 0 -TGTTCGGTGG-3 0 sites. This result indicates that the G2, 3 and 4 positions have a pronounced influence on the catalytic activity of the Dnmt3 de novo MTases. Again, we observe that preferred flanks from statistical analysis of methylation levels closely correlate with the enzymatic activities of Dnmt3a and Dnmt3b, suggesting that the sequence preferences of Dnmt3a and Dnmt3b have a major influence on

Figure 4. Enzyme activity of Dnmt3a in context with the G2, 3 and 4 positions of flanking bases in DNA substrate. The underlined and overlined bases indicate low and high preference sequences, respectively. The first two sequences are the most preferred permutation at the G1 position flanked by most and least preferred flanks at the G2, 3 and 4 positions, respectively and the next two sequences are the least preferred permutation at the G1 position flanked by most and least preferred flanks at the G2, 3 and 4 positions, respectively. In both data sets, the enzyme activity drops sharply when the G2, 3 and 4 positions of the flanks were changed from most preferred to least preferred bases.

shaping the methylation pattern of the human genome.

Discussion It has been the purpose of this study to determine the flanking sequence preferences of the Dnmt3a and Dnmt3b enzymes and investigate their potential biological implications. During the last year, first results of high-throughput methylation analysis of human DNA have been published.29–32 Using available epigenomic data we discovered that there is a clear relationship between the tendency of a CG site to undergo methylation and its flanking sequence. There are distinct and statistically significant consensus sequences flanking CG sites that induce different levels of methylation (5 0 CTTGCGCAAG-3 0 for high and 5 0 -TGTTCGGTGG-3 0 for low methylation) (Table 1). Although there are reports of recruitment of de novo methyltransferases by transcription factors that bind to DNA in a sequence-specific manner,35 this hardly explains such methylation bias at a global level. In order to understand this bias, we supposed it might reflect the intrinsic preferences of the de novo

1108 MTases for certain flanking sequences. To check the flanking sequence effect on the methylation activity of de novo DNA MTases, oligonucleotide DNA substrates were designed and subjected to methylation kinetic studies with Dnmt3a and Dnmt3b. The inner G1 flanks were investigated in an unbiased way using all possible flanks in an identical sequence context. Outer flanks (G2, 3 and 4) were checked using the consensus sequences associated with a high and low level of methylation in the epigenome data. These experiments revealed a more than 500-fold difference in the methylation rates observed at the best and worst substrate sites that correlates almost completely with the results from the statistical analysis of epigenome data. This finding strongly suggests that the flanking sequence preferences of Dnmt3a and Dnmt3b have a pronounced influence on the methylation pattern of human genomic DNA. Furthermore, we observe that C and G-rich flanks tend to reduce the activity of Dnm3a and Dnmt3b. This property could be related to the fact, that DNA segments containing many CG dinucleotides in a highly GC-rich sequence environment (CG islands) usually remain unmethylated during the wave of de novo methylation in the early embryo.5 Therefore, the flanking sequence preferences of Dnmt3a and Dnmt3b could have been one driving force in the evolution of CG islands. It should be mentioned that this agreement of genomic methylation pattern and enzymatic properties of the de novo DNA MTases is of remarkable significance, keeping in mind that several factors may be involved in diluting the effect of sequence preference of enzymes on genomic methylation. De novo methylation of a region may depend on additional factors such as availability of free DNA, local chromatin structure, inhibition or recruitment of MTases by specific DNA-binding factors, interaction of Dnmt3 enzymes with histones, etc. In addition, flanking sequence preferences of Dnmt1 could affect genomic methylation levels. Nevertheless, we show here that the inherent properties of the Dnmt3 enzymes to prefer certain sequences over others play an important role in shaping the genomic methylation pattern. The very low activity of Dnmt3a and Dnmt3b at 5 0 -TGTTCGGTGG-3 0 sites is particularly interesting, as it shows that there are some CG sites in the genome that are highly discriminated by the de novo enzymes and methylation might take place only under special circumstances such as involvement of some recruiting factors or factors like Dnmt3L that stimulates de novo MTases.20,21,36–38 Our experimental data at position K1 and C1 can be compared to a previous report on the sequence preference of Dnmt3a.28 Similar to us, Lin et al. found a strong preference for pyrimidine at position C1. However, there are differences at the K1 position, because Lin et al. did not detect a preference for any base at this position whereas we have found strong preference for purine bases. This difference can be explained by the different

Flanking Sequence Preferences of Dnmt3 Enzymes

experimental approaches. Lin et al. have investigated methylation of random DNA and analyzed the methylation levels of CpG sites in the context of different flanking sequences. The conclusion drawn from this approach on the preferences at the G1 position may be biased due to the influence of outer flanks, because outer flank influence is not statistically averaged. Therefore, it is possible that Lin et al. have missed a possible contribution of a purine at the K1 position. For example, it is feasible that many RCG sites in their data set are positioned within unfavorable outer flanks, or many YCG sites are within favorable outer flanks. In contrast, we investigated all possible G1 flanks in an identical sequence context. In this manner we could dissect out the effects associated with individual permutations at position G1 very accurately. Our result is supported by the finding of a complementary consensus sequence for poor substrates that has a pyrimidine at K1 and a purine at C1 position. In addition, our results correlate well with the statistical analysis of human epigenome data that is based on a large data set of 390 CG sites, thereby ensuring sufficient averaging of outer flank effects. An agreement of Dnmt3a and Dnmt3b catalytic efficiencies and genomic methylation levels was also observed in experiments directly investigating the outer flank effects, where O500-fold difference in the methylation rates of different oligonucleotide substrates was found to be correlated with the genomic methylation level of the corresponding flanking sequences. Our finding that the flanking sequence preferences of Dnmt3a and Dnmt3b are reflected by the human epigenome data indicates an important role for Dnmt3a and Dnmt3b in setting initial patterns of DNA methylation. Furthermore, since insufficient maintenance methylation by Dnmt1 is counteracted by low level of de novo methylation, the Dnmt3a and Dnmt3b also play a role in the preservation of methylation levels.24,25,39 In the human epigenome data one frequently observes methylation patterns in which one highly methylated site is embedded into a low or intermediate methylation region, or in which a low methylation site is surrounded by high methylation sites. These can be explained by selective targeting of the MTase to a high methylation site and blocking of methylation by other proteins at a low methylation site. However, we demonstrate here that another explanation that also should be considered is that the flanking sequences of the site are contributory to the effect. In a recent report, CG islands have been classified into methylation-prone and resistant categories. The results were based on overexpression of Dnmt1 followed by detection of methylation levels in CG islands of various genes.40 Based on flanking sequence preference information of de novo MTases, we analyzed sequences of methylation-prone and resistant CG islands. However, there was no significant difference found in the two sets of CG islands, indicating that Dnmt1 sequence preference

1109

Flanking Sequence Preferences of Dnmt3 Enzymes

is not related to the flanking sequence preference of Dnmt3 enzymes (data not shown). The biological implications of the sequence preferences of Dnmt3a and Dnmt3b de novo MTases might extend beyond the mere methylation level of human DNA. DNA containing unmethylated CG dinucleotide sequences is immunogenic in mammals. Unmethylated CG sites stimulate B cells to produce IL-6 and IL-12, CD41 Tcells to produce IL-6 and IFN-g, and NK cells to produce IFN-g.41,42 In several reports it has been shown that DNA with CG flanked by purine at the 5 0 end and pyrimidine at the 3 0 end induces a higher immunogenic response when compared to other sequences.41,43 This consensus sequence is identical with the high preference consensus sequence for DNA MTases found by us. It is an interesting observation that the flanking sequence that renders high immunogenicity to unmethylated CG dinucleotide sites belongs to the most preferred consensus sequence for de novo DNA methyltransferases. Therefore, the sequences with highest immunogenicity have the lowest probability to be unmethylated in the human DNA, which minimizes the risk of an autoimmune response generated from self DNA. This observation indicates co-evolution of de novo DNA MTases and immune system in the context of CG dinucleotides and the flanking sequences.

Conclusions We have studied the flanking sequence preferences of Dnmt3a and Dnmt3b extending a pioneering study by Lin et al. on Dnmt3a.28 We have studied the influence of G1 flanks on the activity of Dnmt3a and Dnmt3b by determining the methylation rate of all possible sites within the same sequence context in quantitative terms. On the basis of our data, we define a consensus both for favored and disfavored sequences which match each other reasonably well. The Dnmt3a and Dnmt3b enzymes have very similar flanking sequence preferences. Our results show that de novo mammalian DNA MTases exhibit profound preference for bases flanking a CG site (5 0 -CTTACGCAAG-3 0 consensus sequence) on one hand and show almost no activity for some flanking sequences (5 0 -TGTTCGGTGG-3 0 consensus sequence) on the other hand. The effects are so strong that certain CG sites are almost refractory to methylation in vitro. We have employed a bioinformatics approach to analyze the DNA methylation patterns of human DNA. We found that the significant positive and negative flanking sequence bias of de novo MTases is reflected in genomic DNA methylation levels. This finding demonstrates that the in vitro properties of Dnmt3a and Dnmt3b observed here are of clear relevance in vivo. Our results suggest the intrinsic sequence preference of de novo MTases could be one parameter that influences the generation of the DNA methylation patterns of mammalian genomes, a process that is largely not understood so far. In addition, we found a preference of Dnmt3a and Dnmt3b for AT-rich G1

flanks that could be correlated to the origin of CG islands, which are usually unmethylated in the germ line. The preferred flanking sequences have also been found to be correlated to the immune response elicited by an unmethylated CG motif containing DNA depending on the preceding and succeeding bases, indicating a co-evolution of DNA MTases and the immune system.

Materials and Methods Nomenclature Throughout this work, the bases flanking the central CG site are designated as illustrated below. 5 0 -N

K4 N

K3 N

K2 N

K1 N

CG

C1 N

C2 N

C3 N

C4 N

N-3 0

Oligodeoxynucleotides HPLC-purified oligodeoxynucleotides were purchased from MWG (Ebersberg, Germany). The quality of the oligonucleotide synthesis was confirmed by denaturing polyacrylamide gel electrophoresis, demonstrating that all oligonucleotides had the expected length and were pure to O95%. The concentrations of oligodeoxynucleotides solutions were determined spectroscopically using E260 values provided by the supplier. Duplex oligodeoxynucleotides were prepared by adding equimolar amounts of complementary strands, heating to 95 8C and slow-cooling to room temperature. Complete annealing was confirmed by native polyacrylamide gel electrophoresis, demonstrating the absence of detectable amounts of single-stranded DNA after the annealing process. Following are the sequences of all the oligonucleotide substrates used, where Bt denotes biotin. Every oligonucleotide has the same sequence, except for four altered bases flanking CG on either side. s1CG 5 0 a1CG 5 0 s1AA 5 0 a1AA 5 0 s1AT 5 0 a1AT 5 0 s1AG 5 0 a1AG 5 0 s1AC 5 0 a1AC 5 0 s1TA 5 0 a1TA 5 0 s1TC 5 0 a1TC 5 0 s1TG 5 0 a1TG 5 0 s1GC 5 0 a1GC 5 0 s1GG 5 0 a1GG 5 0 s2AC 5 0 a2AC 5 0 s2TG 5 0 a2TG 5 0

Bt- gaagctgggacttccggaaggagagtgcaa -3 0 - ttgcactctccttccggaagtcccagcttc -3 0 Bt- gaagctgggacttacgaaaggagagtgcaa -3 0 - ttgcactctcctttcgtaagtcccagcttc -3 0 Bt- gaagctgggacttacgtaaggagagtgcaa -3 0 - ttgcactctccttacgtaagtcccagcttc -3 0 Bt- gaagctgggacttacggaaggagagtgcaa -3 0 - ttgcactctccttccgtaagtcccagcttc -3 0 Bt- gaagctgggacttacgcaaggagagtgcaa -3 0 - ttgcactctccttgcgtaagtcccagcttc -3 0 Bt- gaagctgggactttcgaaaggagagtgcaa -3 0 - ttgcactctcctttcgaaagtcccagcttc -3 0 Bt- gaagctgggactttcgcaaggagagtgcaa -3 0 - ttgcactctccttgcgaaagtcccagcttc -3 0 Bt- gaagctgggactttcggaaggagagtgcaa -3 0 - ttgcactctccttccgaaagtcccagcttc -3 0 Bt- gaagctgggacttgcgcaaggagagtgcaa -3 0 - ttgcactctccttgcgcaagtcccagcttc -3 0 Bt- gaagctgggacttgcggaaggagagtgcaa -3 0 - ttgcactctccttccgcaagtcccagcttc -3 0 Bt- gaagctgggatgtacgctgggagagtgcaa -3 0 - ttgcactctcccagcgtacatcccagcttc -3 0 Bt- gaagctgggatgttcggtgggagagtgcaa -3 0 - ttgcactctcccaccgaacatcccagcttc -3 0

1110

Flanking Sequence Preferences of Dnmt3 Enzymes

Epigenomic data analysis

Methylation kinetics

The human epigenomic data were collected from the web site†.29,30 For our analysis we used all available data of a continuous stretch of DNA on the human chromosome 6 between positions 31570047 and 31789835, which was chosen arbitrarily and is of sufficient size to represent the whole data set. Methylation data were extracted for sequences comprising 10 bp containing the CG motif in the center. We used the percentage of methylation for each CG site deposited in the data-base. The arithmetic mean of the percentage of methylation among different samples and different tissues was calculated for every CG site. Thereby every CG site was used just once and different numbers of tissues analyzed for the different sites did not influence our results. The mean percentage methylation values were aligned against corresponding flanking sequences of the CG sites in an increasing order. The data set was divided into low and high methylation classes in many overlapping subsets using different cut-off values. The relative frequency for each base at each flanking position was calculated by taking the ratio of frequencies of the subset and the universal set. This was used to find the base with maximum occurrence at a particular position. To determine the significance of bias of the frequency of occurrence of each base, we first calculated a factor (Bi) that describes the deviation of found base distribution at each position (i) from the distribution expected on the basis of the overall frequencies of all four bases at each position in the overall data set:

DNA methylation assays using double-stranded oligodeoxynucleotide substrate was carried out in a microtitre plate as described.44 The DNA substrate and enzyme concentration was 0.5 mM each in methylation buffer (20 mM Hepes (pH 7.0), 1 mM EDTA) at 37 8C for periods of 1, 2, 4, 8, 12, 16, 24 and 40 minutes. Labeled S[methyl-3H]adenosyl-L-methionine (3048 GBq/mmol, NEN) was used at 0.76 mM. All methylation experiments were carried out at least in triplicate and the results averaged. Standard deviations of the average methylation rates were below G20%.

Bi Z

Acknowledgements This work has been supported by grants from the BMBF (BioFuture programme), DFG (JE 252/1 and JE 252/4) and the Fonds der Chemischen Industrie. Thanks are due to H. Gowher for providing purified CDDnmt3b, and to M. Roth and A. Kiss for providing data on M.SssI kinetics prior to publication. We thank the Human Epigenome Consortium (http://www.epigenome.org/) for open access and pre-publication release of data.

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C G A 2 T T 2 C 2 G 2 ðnA obs K nexp Þ C ðnobs K nexp Þ C ðnobs K nexp Þ C ðnobs K nexp Þ

where nXobs and nXexp denote the number of base X observed and expected at the respective flanking position. The overall bias (B) was defined as the sum of the individual biases for all eight flanking positions: BZ

4 X

Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j. jmb.2005.02.044

Bi

iZK4

To estimate the significance of this number, a MonteCarlo simulation was performed. A randomized set of sequences with the same overall distribution of all bases at each position as in the experimental set was generated. Using this set, random B-values were determined to obtain the average B-value and its standard deviation. The random B-values showed a Gaussian distribution. Using these numbers, the probability of obtaining the observed deviation by chance alone was calculated by standard statistical procedures.

Expression and purification of enzymes Recombinant expression of full-length Dnmt3a, and the catalytic domains of Dnmt3a and Dnmt3b was carried out in BL21 E. coli cells, using pETDnmt3a, pETDnmt3aCD and pETDnmt3bCD plasmids. Transformed cells were grown at 37 8C in 500 ml of LB medium containing 75 mg/ml of kanamycin. Protein expression was induced at a cell density of 0.3 A600 nm by addition of 1 mM IPTG and cells were grown for an additional one hour at 37 8C. Protein purification was carried out by Ni-NTA affinity chromatography as described.33 † http://www.sanger.ac.uk/perl/MVP/mvp

References 1. Ehrlich, M. (2003). Expression of various genes is controlled by DNA methylation during mammalian development. J. Cell. Biochem. 88, 899–910. 2. Bird, A. (2002). DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21. 3. Jeltsch, A. (2002). Beyond Watson and Crick: DNA methylation and molecular enzymology of DNA methyltransferases. ChemBiochem, 3, 274–293. 4. Jones, P. A. & Takai, D. (2001). The role of DNA methylation in mammalian epigenetics. Science, 293, 1068–1070. 5. Li, E. (2002). Chromatin modification and epigenetic reprogramming in mammalian development. Nature Rev. Genet. 3, 662–673. 6. Feinberg, A. P. & Tycko, B. (2004). The history of cancer epigenetics. Nature Rev. Cancer, 4, 143–153. 7. Zucker, K. E., Riggs, A. D. & Smith, S. S. (1985). Purification of human DNA (cytosine-5-)-methyltransferase. J. Cell. Biochem. 29, 337–349. 8. Flynn, J., Azzam, R. & Reich, N. (1998). DNA binding discrimination of the murine DNA cytosine-C5 methyltransferase. J. Mol. Biol. 279, 101–116. 9. Fatemi, M., Hermann, A., Pradhan, S. & Jeltsch, A. (2001). The activity of the murine DNA methyltransferase Dnmt1 is controlled by interaction of the

1111

Flanking Sequence Preferences of Dnmt3 Enzymes

10.

11. 12.

13.

14.

15.

16.

17.

18.

19. 20.

21. 22. 23.

24. 25.

26.

catalytic domain with the N-terminal part of the enzyme leading to an allosteric activation of the enzyme after binding to methylated DNA. J. Mol. Biol. 309, 1189–1199. Pradhan, S., Bacolla, A., Wells, R. D. & Roberts, R. J. (1999). Recombinant human DNA (cytosine-5) methyltransferase. I. Expression, purification, and comparison of de novo and maintenance methylation. J. Biol. Chem. 274, 33002–33010. Meehan, R. R. (2003). DNA methylation in animal development. Semin. Cell. Dev. Biol. 14, 53–65. Okano, M., Xie, S. & Li, E. (1998). Cloning and characterization of a family of novel mammalian DNA (cytosine-5) methyltransferases. Nature Genet. 19, 219–220. Gowher, H. & Jeltsch, A. (2001). Enzymatic properties of recombinant Dnmt3a DNA methyltransferase from mouse: the enzyme modifies DNA in a non-processive manner and also methylates non-CpG (correction of non-CpA) sites. J. Mol. Biol. 309, 1201–1208. Huntriss, J., Hinkins, M., Oliver, B., Harris, S. E., Beazley, J. C., Rutherford, A. J. et al. (2004). Expression of mRNAs for DNA methyltransferases and methylCpG-binding proteins in the human female germ line, preimplantation embryos, and embryonic stem cells. Mol. Reprod. Dev. 67, 323–336. Chen, T., Ueda, Y., Xie, S. & Li, E. (2002). A novel Dnmt3a isoform produced from an alternative promoter localizes to euchromatin and its expression correlates with active de novo methylation. J. Biol. Chem. 277, 38746–38754. Hansen, R. S., Wijmenga, C., Luo, P., Stanek, A. M., Canfield, T. K., Weemaes, C. M. & Gartler, S. M. (1999). The DNMT3B DNA methyltransferase gene is mutated in the ICF immunodeficiency syndrome. Proc. Natl Acad. Sci. USA, 96, 14412–14417. Okano, M., Bell, D. W., Haber, D. A. & Li, E. (1999). DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell, 99, 247–257. Xu, G. L., Bestor, T. H., Bourc’his, D., Hsieh, C. L., Tommerup, N., Bugge, M. et al. (1999). Chromosome instability and immunodeficiency syndrome caused by mutations in a DNA methyltransferase gene. Nature, 402, 187–191. Ehrlich, M. (2003). The ICF syndrome, a DNA methyltransferase 3B deficiency and immunodeficiency disease. Clin. Immunol. 109, 17–28. Hata, K., Okano, M., Lei, H. & Li, E. (2002). Dnmt3L cooperates with the Dnmt3 family of de novo DNA methyltransferases to establish maternal imprints in mice. Development, 129, 1983–1993. Bourc’his, D. & Bestor, T. H. (2004). Meiotic catastrophe and retrotransposon reactivation in male germ cells lacking Dnmt3L. Nature, 431, 96–99. Bourc’his, D., Xu, G. L., Lin, C. S., Bollman, B. & Bestor, T. H. (2001). Dnmt3L and the establishment of maternal genomic imprints. Science, 294, 2536–2539. Kaneda, M., Okano, M., Hata, K., Sado, T., Tsujimoto, N., Li, E. & Sasaki, H. (2004). Essential role for de novo DNA methyltransferase Dnmt3a in paternal and maternal imprinting. Nature, 429, 900–903. Riggs, A. D. & Xiong, Z. (2004). Methylation and epigenetic fidelity. Proc. Natl Acad. Sci. USA, 101, 4–5. Chen, T., Ueda, Y., Dodge, J. E., Wang, Z. & Li, E. (2003). Establishment and maintenance of genomic methylation patterns in mouse embryonic stem cells by Dnmt3a and Dnmt3b. Mol. Cell. Biol. 23, 5594–5605. Dodge, J. E., Ramsahoye, B. H., Wo, Z. G., Okano, M. &

27.

28.

29. 30.

31.

32. 33.

34.

35.

36.

37.

38.

39.

40.

41. 42.

Li, E. (2002). De novo methylation of MMLV provirus in embryonic stem cells: CpG versus non-CpG methylation. Gene, 289, 41–48. Ramsahoye, B. H., Biniszkiewicz, D., Lyko, F., Clark, V., Bird, A. P. & Jaenisch, R. (2000). Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc. Natl Acad. Sci. USA, 97, 5237–5242. Lin, I. G., Han, L., Taghva, A., O’Brien, L. E. & Hsieh, C. L. (2002). Murine de novo methyltransferase Dnmt3a demonstrates strand asymmetry and site preference in the methylation of DNA in vitro. Mol. Cell. Biol. 22, 704– 723. Eckhardt, F., Beck, S., Gut, I. G. & Berlin, K. (2004). Future potential of the human epigenome project. Expert Rev. Mol. Diagn. 4, 609–618. Rakyan, V. K., Hildmann, T., Novik, K. L., Lewin, J., Tost, J., Cox, A. V. et al. (2004). DNA methylation profiling of the human major histocompatibility complex: a pilot study for the human epigenome project. PLoS Biol. 2, e405. Novik, K. L., Nimmrich, I., Genc, B., Maier, S., Piepenbrock, C., Olek, A. & Beck, S. (2002). Epigenomics: genome-wide study of methylation phenomena. Curr. Issues Mol. Biol. 4, 111–128. Beck, S., Olek, A. & Walter, J. (1999). From genomics to epigenomics: a loftier view of life. Nature Biotechnol. 17, 1144. Gowher, H. & Jeltsch, A. (2002). Molecular enzymology of the catalytic domains of the Dnmt3a and Dnmt3b DNA methyltransferases. J. Biol. Chem. 277, 20409– 20414. Reither, S., Li, F., Gowher, H. & Jeltsch, A. (2003). Catalytic mechanism of DNA-(cytosine-C5)-methyltransferases revisited: covalent intermediate formation is not essential for methyl group transfer by the murine Dnmt3a enzyme. J. Mol. Biol. 329, 675–684. Di Croce, L., Raker, V. A., Corsaro, M., Fazi, F., Fanelli, M., Faretta, M. et al. (2002). Methyltransferase recruitment and DNA hypermethylation of target promoters by an oncogenic transcription factor. Science, 295, 1079– 1082. Chedin, F., Lieber, M. R. & Hsieh, C. L. (2002). The DNA methyltransferase-like protein DNMT3L stimulates de novo methylation by Dnmt3a. Proc. Natl Acad. Sci. USA, 99, 16916–16921. Suetake, I., Shinozaki, F., Miyagawa, J., Takeshima, H. & Tajima, S. (2004). DNMT3L stimulates the DNA methylation activity of Dnmt3a and Dnmt3b through a direct interaction. J. Biol. Chem. 279, 27816–27823. Gowher, H., Liebert, K., Hermann, A., Xu, G. & Jeltsch, A. (2005). Mechanism of stimulation of catalytic activity of Dnmt3A and Dnmt3B DNA-(cytosine-C5)methyltransferases by Dnmt3L. J. Biol. Chem. In the press. Pfeifer, G. P., Steigerwald, S. D., Hansen, R. S., Gartler, S. M. & Riggs, A. D. (1990). Polymerase chain reactionaided genomic sequencing of an X chromosome-linked CpG island: methylation patterns suggest clonal inheritance. CpG site autonomy, and an explanation of activity state stability. Proc. Natl Acad. Sci. USA, 87, 8252–8256. Feltus, F. A., Lee, E. K., Costello, J. F., Plass, C. & Vertino, P. M. (2003). Predicting aberrant CpG island methylation. Proc. Natl Acad. Sci. USA, 100, 12253– 12258. Krieg, A. M. (2002). CpG motifs in bacterial DNA and their immune effects. Annu. Rev. Immunol. 20, 709–760. Rui, L., Vinuesa, C. G., Blasioli, J. & Goodnow, C. C.

1112

Flanking Sequence Preferences of Dnmt3 Enzymes

(2003). Resistance to CpG DNA-induced autoimmunity through tolerogenic B cell antigen receptor ERK signaling. Nature Immunol. 4, 594–600. 43. Klinman, D. M., Yi, A. K., Beaucage, S. L., Conover, J. & Krieg, A. M. (1996). CpG motifs present in bacteria DNA rapidly induce lymphocytes to secrete

interleukin 6, interleukin 12, and interferon gamma. Proc. Natl Acad. Sci. USA, 93, 2879–2883. 44. Roth, M. & Jeltsch, A. (2000). Biotin–avidin microplate assay for the quantitative analysis of enzymatic methylation of DNA by DNA methyltransferases. Biol. Chem. 381, 269–272.

Edited by J. Karn (Received 24 January 2005; received in revised form 18 February 2005; accepted 18 February 2005)