Chromatin: Methyl-CpG-Binding Proteins D G Skalnik, Indiana University, Indianapolis, IN, USA ã 2013 Elsevier Inc. All rights reserved.
Glossary CpG island Region of DNA containing a cluster of unmethylated CpG dinucleotides, often found near promoters of actively expressed genes. Epigenetics Heritable patterns of gene expression that occur in the absence of altered DNA sequence. Euchromatin Open form of chromatin associated with transcriptionally active genes.
The development of multicellular organisms requires the generation of a variety of terminally differentiated cell types from a common set of totipotent stem cells. Integral to this process is the phenomenon of cell memory (e.g., a liver cell gives rise to progeny liver cells upon cell division, rather than kidney or spleen cells). At the molecular level, cell memory derives from distinct patterns of gene expression that are correlated with each type of cell lineage. However, nearly all somatic cells in a multicellular organism contain the same genome. The term epigenetics refers to heritable patterns of gene expression that occur in the absence of altered DNA sequence, and represents the underlying molecular mechanism for cell memory. The CpG dinucleotide represents an important regulatory component of mammalian genomes. The cytosine of this dinucleotide serves as the target of DNA methyltransferases, which catalyze the formation of 5-methylcytosine, a critical epigenetic modification of DNA. Methylated DNA is associated with transcriptionally inactive genes and is generally tightly packaged as heterochromatin, while actively expressed genes are generally hypomethylated and present in a more open euchromatin configuration. Methyl-CpG-binding proteins contribute to both the formation of heterochromatin and the repression of transcription.
Epigenetic Regulation of Gene Expression The DNA of higher eukaryotes is found tightly associated with proteins in vivo. In fact, at least 50% of the mass of a chromosome is contributed by proteins. Although originally thought to provide primarily a packaging function, it is now appreciated that chromatin structure is highly dynamic and plays a critical function in the epigenetic regulation of gene expression. It is necessary to understand the nature of chromatin structure to appreciate the function of methyl-CpG-binding proteins.
Nucleosomes The primary level of chromatin structure is the nucleosome, in which a DNA double helix is wrapped twice around a core
Genomic imprinting Phenomenon whereby maternally derived and paternally derived alleles of a gene exhibit distinct expression patterns as a result of distinct epigenetic modifications established during gametogenesis. Heterochromatin Highly compact form of transcriptionally inactive chromatin.
octamer of eight histone protein molecules (two copies each of histone H2A, H2B, H3, and H4). Nucleosomes are separated by short linkers of DNA, and 200 base pairs of DNA are present in each nucleosome repeat. Chromatin in this configuration is referred to as a ‘bead-on-a-string’ structure, or euchromatin, and can be transcriptionally active. Additional packaging of nucleosomes into a 30-nm helical array is found in heterochromatin, which is transcriptionally inactive, and is also tightly associated with another species of histone (H1).
Histone Code Determination of the structure of the nucleosome revealed that the amino terminus of each histone molecule extends away from the octamer core. These histone tails are sites of covalent modification, such as phosphorylation, acetylation, and methylation. At least three-dozen distinct sites of covalent modifications within histone tails have been identified, and many of these are associated specifically with either euchromatin or heterochromatin (Figure 1). The sum of histone modifications found in chromatin is referred to as the histone code, and provides important regulatory information with respect to chromatin structure and ultimately gene expression. For example, acetylated histones are found preferentially in euchromatin. The net acetylation state of a specific region of chromatin is the result of balance between the action of histone acetyltransferases (HATs) and histone deacetylases (HDACs). Similarly, methylated lysine 4 of histone H3 is found in euchromatin, while heterochromatin preferentially contains histone H3 methylated at the lysine 9 position. Lysine residues in histone proteins can be mono-, di-, or trimethylated. Posttranslationally modified histone molecules serve as binding sites for effector proteins implicated in the control of chromatin structure. For example, methylated lysine 9 of histone H3 serves as a binding site for the heterochromatin-associated protein HP1, which recruits a histone methyltransferase to the chromatin. This leads to the methylation of adjacent histone H3-lysine 9 residues, formation of additional HP1-binding sites, and spreading of the heterochromatin configuration.
521
522
Molecular Biology | Chromatin: Methyl-CpG-Binding Proteins
Histone H2A Histone H2B Histone H3 Histone H4 Ser 1, P
Lys 5, Ac
Arg 2, Me
Lys 5, Ac
Lys 12, Ac
Lys 4, Me
Arg 3, Me
Lys 9, Ac
Lys 15, Ac
Lys 9, Me
Lys 5, Ac
Lys 119, Ub Lys 20, Ac
Ser 10, P
Lys 8, Ac
Lys 24, Ac
Lys 14, Ac
Lys 12, Ac
Lys 123, Ub
Arg 17, Me Lys 16, Ac Lys 18, Ac
Ser 1, P
Lys 20, Me
Lys 23, Ac Arg 26, Me Lys 27, Me Ser 28, P
DNA. Indeed, loss of appropriate cytosine methylation results in the induction of repetitive DNA expression. Maintenance of repetitive elements in heterochromatin configuration may also be important for chromosome stability by preventing homologous recombination between repetitive element sequences.
X-Chromosome Inactivation X-chromosome inactivation refers to the irreversible inactivation during early embryonic development of one of the X-chromosomes present in each cell of a female. The inactive X-chromosome becomes heavily methylated at CpG motifs and takes on a heterochromatin configuration.
Lys 36, Me Lys 79, Me Figure 1 Covalent modifications of histones. Several-dozen covalent histone tail modifications that contribute to the histone code are summarized. For each species of histone protein, the amino acid residue, position, and type of modification are presented. Me, mono-, di-, or trimethylation; P, phosphorylation; Ac, acetylation.
Cytosine Methylation Another covalent modification associated with heterochromatin and repression of gene expression is cytosine methylation, which occurs in the DNA double helix predominantly in the context of the 50 -CpG-30 dinucleotide (where p denotes the phosphodiester linkage between the cytosine and guanine nucleotides). In mammalian genomes, 75% of CpG motifs are methylated, and these are concentrated in areas of repetitive DNA sequences. This dinucleotide is the target of DNA methyltransferase proteins, which catalyze the addition of a methyl group to the C5 position of the cytosine ring. DNA methyltransferase 1 is the major maintenance methyltransferase. It exhibits high affinity for hemi-methylated DNA (the immediate product of semi-conservative DNA replication), and catalyzes the addition of a methyl group to the CpG cytosines within the newly synthesized DNA strand. DNA methyltransferases 3a and 3b are de novo DNA methyltransferases that are responsible for establishing specific patterns of cytosine methylation during gametogenesis and early development. The mechanisms by which appropriate patterns of cytosine methylation are initially established are not well understood.
Functions of Cytosine Methylation Appropriate patterns of cytosine methylation within DNA are essential for numerous biological processes. This is clearly reflected in the finding that mice lacking DNA methyltransferase gene(s) die prior to or shortly after birth.
Genomic Imprinting Distinct patterns of cytosine methylation are established throughout the genome during spermatogenesis and oogenesis, leading to parent-of-origin epigenetic marks that distinguish maternally derived alleles from paternally derived ones. Such genomic imprinting can result in a uni-allelic pattern of gene expression. Genomic imprinting is essential for normal development, as zygotes carrying two sets of paternal or two sets of maternal alleles fail to develop normally, even though they carry an appropriate diploid complement of the DNA genome. To date, 150 mammalian genes have been identified that exhibit genomic imprinting.
Cancer A large number of DNA mutations in oncogenes and tumor suppressor genes have been identified in cancer cells. More recently, it has been recognized that cancer cells also exhibit epigenetic aberrations, including global cytosine hypomethylation, as well as hypermethylation of CpG motifs, specifically within the promoters of tumor suppressor genes. Pharmacologic modulation of the epigenome represents a novel chemotherapy strategy. For example, inhibition of HDAC enzymes is being utilized to induce expression of inappropriately silenced tumor suppressor genes in cancer patients.
Methyl-CpG-Binding Proteins The function of methylated CpG motifs is mediated in part by methyl-CpG-binding proteins, which specifically bind to DNA at sites of methyl-CpG and affect transcription rates and local chromatin structure. A small family of proteins share a common DNA-binding domain, the methyl-CpG-binding domain (MBD) (Figure 2).
MeCP2 Regulation of Gene Expression As mentioned above, cytosine methylation is associated with heterochromatin and repressed gene expression. As the majority of 5-methylcytosine is found in repetitive DNA elements, this may provide a defense mechanism against expression of parasitic
MeCP2 is the founding member of the MBD protein family. It localizes to heterochromatin in vivo and represses transcription via a carboxyl terminus transcriptional repression domain (TRD). MeCP2 also contains an amino terminus MBD DNA-binding domain. One mechanism for MeCP2-mediated
523
Molecular Biology | Chromatin: Methyl-CpG-Binding Proteins
MBD
TRD
MBD CXXC
C
TRD
MBD1 MBD MBD2
O
NH2
MeCP2
O
TRD MBD
MBD3
N
C
C
C
C
Deamination
Uracil DNA glycosylase- O mediated DNA repair
N Cytosine
N
C
C
C N Uracil
DNMT MBD
Glycosylase
NH2
MBD4 MBD
O
C
MBD5
N
C
C
C
CH3
C Deamination
CH3
N
C
C
C
MBD MBD6
Figure 2 Schematic comparison of the MBD family of proteins. The conserved methyl-CpG-binding domains (MBD) present in each protein are aligned. TRD, transcriptional repression domain.
transcriptional repression appears to be regulation of chromatin structure, as the TRD of MeCP2 interacts with a co-repressor that contains an HDAC. Disruption of the X-chromosome-linked MeCP2 gene in mice leads to premature death, and mutations in the human MeCP2 gene causes Rett syndrome, a progressive neurodegenerative disease found nearly exclusively in girls. Deficiency of MeCP2 in neural cells leads to increased histone acetylation and inappropriate transcription of repetitive DNA elements. The severity of symptoms in Rett syndrome depends on the exact nature of mutation present in the MeCP2 gene (most of which alter the sequence of the MBD or TRD domains), and the mosaic pattern of mutant gene expression resulting from random X-chromosome inactivation.
MBD1 Similar to MeCP2, MBD1 contains an amino terminus MBD DNA-binding domain and a carboxyl terminus TRD, although the TRD exhibits no sequence similarity to the MeCP2 TRD. MBD1 binds methylated DNA targets and represses gene expression. Various isoforms of MBD1 produced by differential RNA splicing additionally contain two or three CXXC domains, a cysteine-rich motif also found in DNA methyltransferase 1, human trithorax (HRX), and CXXC finger protein 1 (CFP1). The CXXC domain constitutes the DNA-binding domain of CFP1, and is responsible for that factor’s binding specificity for DNA containing unmethylated CpG motifs. Similarly, HRX also binds unmethylated-CpG motifs, and the third CXXC domain of MBD1 is responsible for this factor’s ability to bind unmethylated target sequences and repress transcription. The significance of the CXXC domain for MBD1 function is unclear, as it is not required for binding to sites of methyl-CpG motifs. Mice lacking MBD1 are viable, but suffer from reduced neural differentiation and increased genomic instability.
MBD2 and MBD3 MBD2 and MBD3 are the most structurally related members of the MBD family, sharing 75% similarity over the entire length
O
N 5-Methylcytosine
MBD4-mediated DNA repair
O
N Thymidine
Figure 3 Summary of chemical relationships between cytosine, 5-methylcytosine, uracil, and thymidine, including how these species inter-convert in vivo. DNMT, DNA methyltransferases.
of MBD3, the shortest member of the family. MBD2 contains an additional 140-amino-acid amino terminus extension. Despite this structural similarity, these MBD proteins exhibit quite dissimilar properties. MBD2 binds to methylated DNA and represses transcription via overlapping MBD and TRD domains. MBD2 interacts with HDAC1, and is a component of the methyl-DNA-binding complex MeCP1. The ability of MBD2 to repress transcription is sensitive to inhibitors of HDAC activity, suggesting a mechanism for cross-talk between cytosine methylation and chromatin structure. Surprisingly, mice lacking the MBD2 gene are viable despite lacking the MeCP1 complex. By contrast, MBD3 contains a divergent MBD domain and fails to interact directly with methylated DNA. However, similar to MeCP2, MBD3 is a component of a co-repressor complex. The importance of MBD3 is illustrated by the finding that mice lacking an MBD3 gene fail to develop to birth.
MBD4 MBD4 is an unusual member of the MBD protein family, as it apparently plays no role in transcriptional repression; rather, it functions as a DNA-repair protein. Cytosine exhibits a propensity to deaminate spontaneously to form uracil (Figure 3). Cells contain DNA repair enzymes (e.g., uracil DNA glycosylase) that recognize a uracil base in DNA as a site of damage, and excise this nucleotide to permit replacement with cytosine. This may explain the presence of thymidine in DNA rather than uracil, as it would be more difficult for a cell to recognize a site of cytosine deamination if uracil were a normal nucleotide used for encoding genetic information in DNA. However, deamination of 5-methylcytosine leads to the formation of thymidine. It is this rate of spontaneous deamination that leads to the relative rarity of the CpG dinucleotide in mammalian genomes (only 10% of the expected frequency). Conversely, CpG motifs near widely expressed genes are usually unmethylated, and hence avoid this
524
Molecular Biology | Chromatin: Methyl-CpG-Binding Proteins
mutagenic pressure. This is hypothesized to explain the clustering of unmethylated CpG motifs in CpG islands near the promoters of 50% of mammalian genes. The MBD domain of MBD4 binds with highest affinity to methylated CG/TG mismatched sequences, and the protein also contains a glycosylase activity in its carboxyl terminus, which removes the mismatched thymidine base from DNA to permit replacement with cytosine. The biological significance of this enzymatic activity is illustrated by an elevated rate of mutation at CpG motifs and increased tumorigenesis in mice lacking MBD4.
MBD5 and MBD6 Relatively little is known about these widely expressed MBDcontaining proteins. However, haplo-insufficiency of MBD5 leads to severe mental retardation in humans.
Cross-Talk between DNA Methylation and Chromatin Structure The nature of epigenetic modifications associated with transcriptionally active or inactive genomic loci is now well established. Current investigations are focused on the regulation of these modifications. How does a cell choose which regions of DNA to methylate and/or assemble into heterochromatin? Recently, it has become clear that DNA methylation and chromatin structure are highly interdependent epigenetic processes. For example, perturbations of the cellular machinery responsible for nucleosome remodeling or histone modifications also result in aberrations in the pattern of cytosine methylation. Conversely, the association of MBD family members
with chromatin-modifying enzymes (e.g., HDACs) suggests a mechanism whereby cytosine methylation may provide an epigenetic mark that permits targeted recruitment of chromatin-remodeling enzymes to specific genomic loci. Hence, cytosine methylation patterns and the histone code may represent mutually reinforcing processes that lead to the stable maintenance of chromatin packaging and gene expression patterns characteristic of cellular memory.
See also: Cell Architecture and Function: Chromosome Organization and Structure, Overview; Molecular Biology: Chromatin: Nucleosome Positioning – the GAL Promoter; Chromatin: Physical Organization; Regulation of Chromatin Dynamics.
Further Reading Ballestar E and Wolffe AP (2001) Methyl-CpG-binding proteins: Targeting specific gene repression. European Journal of Biochemistry 268: 1–6. Bienvenu T and Chelly J (2006) Molecular genetics of Rett syndrome: When DNA methylation goes unrecognized. Nature Reviews Genetics 7: 415–426. Esteller M (2008) Epigenetics in cancer. New England Journal of Medicine 358: 1148–1159. Felsenfeld G and Groudine M (2003) Controlling the double helix. Nature 421: 448–453. Jaenisch R and Bird A (2003) Epigenetic regulation of gene expression: How the genome integrates intrinsic and environmental signals. Nature Genetics 33: 245–254. Jones PA and Baylin SB (2007) The epigenomics of cancer. Cell 128: 683–692. Jorgensen HF and Bird A (2002) MeCP2 and other methyl-CpG-binding proteins. Mental Retardation and Developmental Disabilities Research Reviews 8: 87–93. Nakao M (2001) Epigenetics: Interaction of DNA methylation and chromatin. Gene 278: 25–31. Wade PA (2001) Methyl CpG-binding proteins and transcriptional repression. BioEssays 23: 1131–1137. Wade PA (2001) Methyl CpG-binding proteins coupling chromatin architecture to gene regulation. Oncogene 20: 3166–3173.