Genomic studies of transcription factor–DNA interactions Devanjan Sikder and Thomas Kodadek A central issue in the regulation of gene expression is the physical association of transcription factors with relevant promoter sequences. Recently, technological advances have allowed researchers to analyze these processes on a genomic scale. In particular, the combination of the chromatin immunoprecipitation (ChIP) technique with microarray analysis (the ‘ChIP to chip’ experiment) is providing a wealth of new and surprising data on transcription factor–chromatin interactions. These advances are reviewed here. We also discuss future challenges in the area. Addresses Departments of Internal Medicine and Molecular Biology, Center for Biomedical Inventions, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, Texas 75390-8573, USA Corresponding authors: Sikder, D (
[email protected]); Kodadek, T (
[email protected])
Current Opinion in Chemical Biology 2005, 9:38–45 This review comes from a themed issue on Proteomics and genomics Edited by Benjamin F Cravatt and Thomas Kodadek Available online 8th January 2005 1367-5931/$ – see front matter # 2005 Elsevier Ltd. All rights reserved. DOI 10.1016/j.cbpa.2004.12.008
Introduction Most biological processes are regulated at the level of transcription. In mammalian cells, the transcription of almost every gene is regulated by several positive and negative transcription factors that recognize specific DNA sequences near the target gene, often in promoter regions that are located upstream of the transcriptional start site. Transcription factor–DNA association in a living cell is a complex process [1]. The genome is vast and the DNA is packaged into a variety of chromatin structures that determine its accessibility to a protein [2]. Furthermore, most transcription factors interact with other sequencespecific binding proteins, chromatin remodeling and modification complexes and the general transcription machinery. Any or all of these protein–protein interactions may affect the DNA-binding characteristics of the factor of interest. Thus, while detailed in vitro studies of DNA–protein interactions have provided, and will continue to provide, useful information, it is clear that studies of transcription factor–DNA interactions in living cells are critical to understanding transcriptional regulation (or any Current Opinion in Chemical Biology 2005, 9:38–45
other nuclear process). As described below, this field has undergone a revolution recently, largely because of the development of the so-called ‘ChIP to chip’ assay, which combines the powerful technique of chromatin immunoprecipitation with DNA microarray analysis. Before considering these recent advances, however, it is worthwhile to review briefly earlier work and thinking in this area, which has set the stage for the modern era.
Early days Until quite recently, the analysis of transcription factor– DNA interactions in living cells at anything close to a genomic scale was a very challenging proposition. This was due largely to limitations in analytical techniques. A good example is the pioneering study of Mark Biggin and colleagues of Drosophila regulators Even-skipped (Eve), Fushi tarazu (Ftz) and Zeste [3]. These proteins, which are part of a large superfamily of homeodomain factors, provided a conundrum for early functional genomics researchers. In vitro, homeodomain proteins exhibit quite modest sequence specificity, often as little as 3–4 base pairs [4]. Simple arithmetic would suggest that there must be a massive number of sites in the Drosophila genome that would fit this description. So how could these proteins achieve specific binding to the promoters that they regulate? First, it is important to note that there is an inherent bias in this question. It was assumed that there must exist mechanisms in the cell that elaborate the very modest intrinsic DNA-binding specificity of homeodomain proteins. The alternative explanation, that large amounts of these homeodomain proteins were produced and are scattered all over the genome, was not given much credibility. Biggin and co-workers analyzed Ftz and Eve distribution in Drospophila early embryos using UV cross-linking and Southern analysis [3]. This was a heroic effort. UV crosslinking is a relatively low-efficiency process for most dsDNA-binding proteins and the DNA in the product cannot be amplified by PCR. Thus, a great deal of material must be brought through this protocol to analyze the products. The upside is that this procedure provides at least a semi-quantitative measure of protein–DNA associations along the DNA. The surprising result of these studies was that while Ftz and Eve were found associated with the promoters they regulate, as expected, lower levels of occupancy could also be detected at many sites on the genome, a conclusion to which the transcription community was quite resistant. However, several careful subsequent experiments proved Biggin correct. For example, it was found that an embyro contained >50 000 molecules of Eve and Ftz, which explains www.sciencedirect.com
Genomic studies of transcription factor–DNA interactions Sikder and Kodadek 39
how they can be scattered across the genome, yet still occupy specific promoter sites. This is not to say that all transcription factors evince such modest specificity in vivo or that they exist at such high levels. For example, the well-studied Gal4 transcription factor of Saccharomyces cerevisiae exists at very low copy number ( 60 dimers per cell; T Kodadek and SA Johnston, unpublished data) and has much higher specificity than homeodomain factors, recognizing sequences of the form 50 -CGGN11CCG, where N can be A, C, T or G, although the protein has some preferences with regard to this central sequence [5]. Furthermore, Gal4 binds to promoters in concert with the general transcription apparatus [6], further enhancing its specificity for GAL promoters. Nonetheless, even in this system, association of the transcription factor with sites outside of traditional promoters is observed [7]. In all of these systems, from the lowest specificity homeodomain to the more discriminating factors, it will be extremely interesting to examine the functional consequence of activator binding at these extra-promoter sites.
The ChIP to chip assay The mapping of transcription factor binding sites on large swaths of the genome has undergone a revolution over the past few years, largely due to the development of the ‘ChIP to chip’ technique. This process, which in principle can provide a genome-wide view of protein–DNA interactions, is an elaboration of the venerable chromatin immunoprecipitation (ChIP) technique. In the ChIP protocol (see Figure 1a), cells or tissues are treated with formaldehyde, a cell-permeable small molecule that can mediate protein– protein and protein–DNA cross-linking through Schiff base formation. If one wishes to increase the degree of protein–protein cross-linking (for example to better observe indirect protein–DNA interactions in this assay), more efficient protein–protein cross-linking by dimethyl adipimidate can be done first, followed by formaldehyde treatment [8]. Next, the cells are lysed, the chromatin is isolated and sheared into 200–1000 base pair fragments, usually by sonication. The protein of interest is then immunoprecipitated using a specific antibody and any DNA that is cross-linked to it will be co- immunoprecipitated. Although the amount of cross-linked DNA is small, it can be detected by first reversing the cross-links through incubation in a low pH buffer (which hydrolyzes the Schiff base adducts), then amplifying the region of interest using appropriate PCR primers. This is the major advantage of the ChIP technique over other protocols for visualizing DNA–protein interactions in cells; the cross-link is readily reversible and the DNA can be amplified by PCR. The ChIP to chip version of this protocol [9] (Figure 1b) differs only in the post-IP steps. In this case, one does not amplify a particular co-immunoprecipitated DNA with defined primers, but instead endeavors to analyze many www.sciencedirect.com
or even all of the DNAs enriched during the IP step. Briefly, the following steps are employed to do this. The DNA released after reversal of the cross-link is repaired using a DNA polymerase to produce blunt ends. A common linker is then ligated to each DNA. All of the DNA sequences present are then amplified using PCR primers complementary to the common linkers. During this step, fluorescent labels are incorporated through the addition of labeled nucleotide triphosphates [10]. Alternatively, aminoallyl-substituted nucleotides are sometimes incorporated, allowing subsequent attachment of activated ester derivatives of fluorescent dyes post-amplification [11]. This protocol is applied to both the total DNA, which is labeled with one color dye (often Cy3), as well as the DNA sample enriched in the immunoprecipitation, which is labeled with a different color fluor (often Cy5). These two samples are then mixed and hybridized to a microarray composed of nucleic acid probes that represent the regions of the genome that one would like to probe for binding of the transcription factor of interest. In this type of two-color experiment, binding of the protein to a region of DNA is inferred if the intensity of the immunoprecipitated DNA exceeds significantly that of the intensity of the total DNA (i.e. an enrichment has occurred in the immunoprecipitation). It is up to the investigator to determine what ‘significantly’ means (i.e. how much of an enrichment is required to score a loci as a site of protein binding). In summary, the ChIP to chip experiment employs a simple strategy for amplifying unknown pieces of DNA through the attachment of common linkers (a well-used strategy in molecular biology) and then employs the power of microarray technology to conduct the analysis. This is one of the few cases in proteomics where one can bring to bear the full power of genomics technology, in that it reports a property of a protein (the DNA-binding factor) indirectly through a nucleic acid readout.
DNA microarrays for ChIP to chip assays The ideal microarray for a ChIP to chip experiment would contain DNA probes that would allow whole genome coverage. These do not currently exist for the simple reason that too many probes would be required. The human genome, for example, contains around 3 109 base pairs. Given that the inherent resolution of the ChIP assay is 500 bp (defined by the average fragment size after shearing the chromatin), one would like to have a DNA probe for approximately 100 bp region of DNA to ensure redundant coverage (not all DNA probes work well). Thus, an array of over 10 million features would be required. This exceeds the capacity of even the most advanced chip-making capabilities available today. What kind of arrays are available? First, it is worthwhile to review briefly the two basic types of DNA microarrays. The first arrays were created by spotting DNAs onto polylysine-modified glass slides (for a review, see [12]). Current Opinion in Chemical Biology 2005, 9:38–45
40 Proteomics and genomics
Figure 1
(a)
(b)
Input DNA
IP enriched DNA
T4 DNA pol Repaired ends T4 DNA ligase Linker ligation Cross-linking X
dTTP
dCTP
dATP dCTP LM-PCR and labeling
X
X
dATP dGTP
DNA shearing
dGTP
dTTP
X Labeled DNA
X
X Immunoprecipitation
X X
X
Y
Y
X
Hybridization
Analyze
DNA elution Current Opinion in Cell Biology
Procedure for genome-wide association study. (a) Chromatin immunoprecipitation. Cells are grown under the desired experimental conditions. Protein–protein and protein–DNA interactions are trapped using a cross-linking agent, usually formaldehyde. Following cell lysis, DNA is mechanically sheared and then immunoprecipitated using an antibody or a combination of antibodies against a protein of interest. Cross-links are reversed at low pH. Following deproteinization, DNA is purified. (b) ChIP to chip experiment. Staggered ends of input (non-enriched DNA) and IP-enriched DNA are repaired using T4 DNA polymerase and then ligated to unidirectional linkers. During ligation mediated PCR (LM-PCR), input DNA is labeled with a fluorophore (Cy3), while IP-enriched DNA is labeled with a different fluorescent dye (Cy5). Cy3- and Cy5-labeled DNA pools are mixed and hybridized to a microarray and analyzed.
This is now most often done using synthetic oligonucleotides 50–80 residues long. Obviously, however many probes one wishes to have equals the number of synthetic oligonucleotides (or PCR products) one must spot onto slides. The arrays are made using pin spotting robots that transfer a solution of the DNA from a 384-well plate to the microscope slide. Due to the inevitable spreading of the liquid on the plate, feature sizes below 100 microns are difficult to achieve. This and other practical considerations limit the density of spotted arrays to approximately 20 000–30 000 features. A fundamentally different type of approach to the construction of oligonucleotide arrays is to synthesize the oligomers in situ on the array. There are several ways to do this, the most important being photolithography [13]. These chips, marketed commercially by Affymetrix, can be of much higher density, with the most advanced containing up to two million features. A further advantage of Affymetrix-style chips over spotted arrays is that they are more reproducible [14], as physical spotting can produce a variety of artifacts, such as non-homogenous spots, carryover of one probe onto another feature Current Opinion in Chemical Biology 2005, 9:38–45
(through incomplete washing of pins), etc. Therefore, it is fair to say that most genomics researchers prefer to use Affymetrix arrays if possible. However, these chips are quite expensive. Furthermore, the process by which they are made, which requires micromachined physical masks for the photolithographic chemistry, is not particularly flexible. Rather it is good for making a large number of a few types of arrays. Since most of the emphasis in the microarray community to date has been on gene expression profiling, most chips contain probes to coding sequences in the genome rather than promoters or other extragenic regions. ‘Tiling’ arrays produced photolithographically [15] and suitable for unbiased ChIP to chip analyses (i.e. not focused on coding regions) are just beginning to become available and most of the work has been done with custom-made spotted arrays. However, certain technical advances in the construction of these arrays, such as digital photolithography, and the increasing importance of transcription factor mapping are likely to make these tools much more available in the future. This is fortunate, because the availability of suitable arrays remains the greatest limiting factor in making this powerful tool accessible to large numbers of laboratories. www.sciencedirect.com
Genomic studies of transcription factor–DNA interactions Sikder and Kodadek 41
Applications of the ChIP to chip assay The first ChIP to chip experiments were reported at more or less the same time by the Young and Brown groups. Both studied transcription-factor–chromatin binding in the yeast S. cerevisiae. Yeast is a good model system for many reasons, one of which is that its genome is much smaller than that of mammals, allowing microarrays to be produced that provide genome-wide coverage. Young’s group [10] mapped binding sites for Gal4 and Ste12 and identified pathways regulated by these transcription factors in yeast. Brown and co-workers, who were instrumental in the development of spotted DNA microarray technology, conducted similar experiments on the yeast cell cycle transcription factors SBF and MBF [16]. Since that time, a great deal of work has been done on yeast transcription factors [17,18,19], culminating with the publication by Young’s group of the genome-wide association map for 106 epitope-tagged transcription factors [20]. These data, combined with expression analysis datasets [21], provide a rich overview of transcriptional regulation in this simple eukaryotic organism. This technology was later applied to mammalian systems. Initial studies addressing recruitment of Max, Gata1, E2f and Rb transcription factors necessarily surveyed only a small fraction of the genome [22–24] due to the limitations of the available arrays mentioned above as well as the incomplete nature of the genome annotation effort, which has improved steadily, allowing more and more promoter regions to be identified [25,26,27]. In mapping the cMyc binding sites, Li et al. [25] used arrays that had one spot per regulatory region, each of which was about 0.9 kB. This design would, however, fail to report inter-
actions that involve further upstream regions. At the other extreme, to map GATA-1 binding sites, Snyder’s group [22] used arrays that represented the entire 75 kb promoter region of b-globulin dissected into 1 kB fragments. Although this was a small array it had unbiased representation of the entire b-globulin locus. Owing to the complexity of the mammalian genome and the limited understanding of eukaryotic promoters, it was harder to construct promoter arrays. Because CpG islands are frequently associated with promoters [28], Farnham’s group and others employed arrays that represented PCR-amplified CpG islands to map binding sites for Rb, E2F, Suz12 and methylated histones [23,27,29,30]. A comprehensive list of studies that have been carried out to date appears in Table 1. These studies highlighted the potential regulatory roles mediated by the transcription factors on the target genes. Yet another theme that emerged was that the transcription factor binding sites were both prevalent and functional on non-consensus sites as well [24]. This raised the fundamental question of whether using promoter-focused arrays to study genomic transcription factor–DNA association was introducing an unjustified bias. With this in mind, Snyder’s group employed microarrays that represented virtually all the non-repetitive sequences of chromosome 22 to generate association profiles for NF-kB [31]. In addition to expected binding sites, NF-kB was found to bind many non-consensus sequences. But the most surprising result was that more than 40% of the observed binding sites were mapped to non-coding regions and pseudogenes. It could be that these biding events have no functional consequence [32] and simply
Table 1 Genome-wide protein–DNA association studies. Protein
Array type
Species
Reference
Gal4 Ste12 p53, cMyc, Sp1 RNA pol III H3 lys9 methylation CREB NfkB E2F HNF RSC ORC and MCM H2B, H3 and H4 acetylation H3 and H4 acetylation, H3 dimethylation Acetylation of H3 and H4, Di and tri- methylation of H3 SBF, NBF GATA1 cMyc, Max Rb Suz12, H3 Lys27 Rap1 HSF
Intergenic regions Intergenic regions Tiling array Ch. 21 and 22 PCR-amplified intergenic regions CpG microarrays Non-repetitive chromosome 22 sequences Tiling array chromosome 22 CpG Island; PCR amplified intergenic regions 13 000 Promoter regions Intergenic regions Intergenic sequences and open reading frames Intergenic regions Intergenic and coding regions cDNA array and non-repetitive sequence from chromosome 2L Intergenic regions Tiling array, b-globin locus Intergenic regions CpG microarray Tiling, CpG island and oligonucleotide promoter arrays ORFs and intergenic regions Intergenic regions
S. cerevisiae S. cerevisiae Human S. cerevisiae Human Human Human Human Human S. cerevisiae S. cerevisiae S. cerevisiae S. cerevisiae Drosophila
[10] [10,55] [15] [56] [30] [57] [31] [29,58] [59] [60] [61] [62] [63] [64]
S. cerevisiae Human Human Human Human S. cerevisiae S. cerevisiae
[16] [22] [25] [23] [27] [65] [66]
www.sciencedirect.com
Current Opinion in Chemical Biology 2005, 9:38–45
42 Proteomics and genomics
reflect the fact that these sequences exist in the cell and transcription factor binding to them is not detrimental and therefore has not been selected against. To test this idea, Cawley et al. [15] asked if these sites of activator binding were also sites of transcription. The authors combined tiled arrays across non-repeat genomic sequences of chromosomes 21 and 22 with chromatin immunoprecipitation to map occupancy sites for Sp1, cMyc and p53. Only about one-fifth of the sites mapped were coincident with promoters while one-third of the binding sites were in non-coding regions. They further demonstrated that the non-coding sites associated with transcription factors were regulatory regions for noncoding RNA (nc-RNA). Comparison of binding and expression studies revealed that these transcription-factor binding sites and the corresponding nc-RNAs were evolutionarily constrained in the human and mouse genomes. Further work by the authors suggests that expression of these non-coding RNAs is regulated by the bound transcription factors in response to external stimuli. By the time Cawley et al. [15] undertook their research, existing evidence suggested that non-translated RNAs are actively transcribed, accounting for 98% of the entire transcriptional activity of a cell [33,34]. This suggested that non-coding DNA may be endowed with critical regulatory functions, which is supported by their evolutionary conservation [34–36] and their association with development and diseases [37–41]. The significance of Cawley’s observation lies in the finding that transcription from nc-RNA is regulated in the same manner as that of the coding regions, and this was achieved by integrating ChiP to chip assay with expression studies. Another useful application of ChIP to chip assays is the analysis of the effect of post-translational modifications on DNA-associated proteins. This has been brought to bear most powerfully for mapping the distribution of modified histones in the genome and correlating these data with patterns of gene expression derived using microarrays in their ‘traditional’ way [18,27,42]. These types of studies are bound to increase in importance as we discover further post-translational modifications of functional consequence in regulating transcription and as better antibodies become available to recognize modified proteins or the modifications themselves. Post-translational modifications of DNA-binding proteins are often reversible and sometimes result in epigenetic regulations such as positional effects [43], gene silencing [44,45], and global demethylation and methylation [46]. Because these effects are mediated by acetylases, deacetylases, methylases, etc., not surprisingly, they were also the first to be included in transcriptional profiling. Thus, genetic and chemical perturbations of histone deacetylases (HDACs) were performed to identify targets of these players [47]. For example, Grunstein’s group Current Opinion in Chemical Biology 2005, 9:38–45
generated genome-wide maps for HDAC activity by probing for sites that were hyperacetylated upon HDAC disruption [18]. Coupling association maps with genomewide expression studies allowed them to segregate direct targets from the ripple effects observed in a previous study [47]. Several groups generated histone acetylation and methylation maps [42,48] in an effort to correlate these modifications with corresponding transcriptional activity. Hyperacetylation of H3 and H4 and hypermethylation of H3 appear to be common themes employed to activate transcription; conversely, the same sites are deacetylated and hypomethylated to inactivate gene transcription [49]. Do these modifications occur on intergenic regions or the coding regions? While global correlation between methylation and demethylation clearly appears to involve the coding region [42,49], location of acetylation and deacetylation appears to be an issue. A study by Bernstein et al. [42] found the highest correlation between histone acetylation of intergenic regions and transcriptional activity. By contrast, Roh et al. [50] found that the highest acetylation levels are not associated with promoters, but the 50 coding region. Nevertheless these studies clearly demonstrate the significance of employing global mapping techniques in investigating differential effects of post-translational modifications. A direct application of such studies would be the identification of genes silenced by histone modification in carcinogenesis [30]. The significance of such profiling for transcription regulatory proteins is indisputable but such studies have been strikingly lacking. Studies on differential modification of the same protein in diseased and healthy states will provide enormous information and open up new avenues for therapy and further research.
Future directions It is a safe bet that the ChIP to chip assay, with its ability to provide a genomic view of protein–DNA interactions, will only grow in importance and utility. As mentioned above, a critical issue is the development of suitable microarrays to support these studies. Increasing commercial interest in the ChIP to chip application and advances in digital photolithographic synthesis of DNA microarrays [51,52] will speed progress. Another issue, common to almost all aspects of proteomics, is the need for large numbers of high affinity and high specificity protein-binding agents. ChIP to chip assays can only be performed when good antibodies are available. This application is particularly demanding, since the antibody must be capable of recognizing the native protein. Many antibodies, even those that work well for western blots, fail this more rigorous test. Again, however, it is fortunate that major strides are being made in higher-throughput antibody production [53]. Another issue, which is perhaps less obvious, is the need for alternative protein–DNA cross-linking methodologies. The use of formaldehyde has two limitations. First, because this reagent cross-links both proteins with each www.sciencedirect.com
Genomic studies of transcription factor–DNA interactions Sikder and Kodadek 43
other as well as to DNA, ChIP experiments report direct as well as indirect binding of proteins to DNA. A method that produced only protein–DNA cross-links would be a valuable complement, allowing these scenarios to be distinguished. One possibility would be laser-induced UV cross-linking [54] if conditions could be devised to reverse the cross-link, allowing the DNA to be amplified. Another problem with using formaldehyde, a general fixative, is so much material is cross-linked to the DNA that it is extremely difficult to employ restriction enzymes to digest the chromatin sample. This is unfortunate, since cleavage of the chromatin at specific positions would greatly increase the resolution of the technique. Physical shearing of the DNA produces random fragments with an average size of 500 bp, defining the limit of the resolution of the current method. This is a particularly important issue in organisms such as yeast with relatively compact genomes.
Conclusion In summary, the development of the ChIP to chip assay has provided an extraordinarily powerful tool for the analysis of DNA–protein interactions in living cells or tissues on a global scale. In the near future, further advances in microarray construction and the increased availability of useful antibodies will increase the utility of this approach even more. Genomic profiling of transcription factor-binding sites, histone modifications, etc. will almost certainly emerge as a central tool in understanding the systems biology of gene regulation in eukaryotic cells. In addition, studies of the genomic distribution of nuclear proteins that are not sequence-specific DNA binders, such as general transcription machinery, the proteasome and its component pieces, DNA replication and repair complexes, etc. will shed new light on fundamental aspects of basic genome function and maintenance. Already, the realization that the majority of transcription factors examined to date are localized outside of promoter sequences has contributed significantly to our growing realization of the importance of abundant non-coding small RNAs in the cell.
Acknowledgements Work in our laboratory on genomic analysis of transcription proteins was funded by a grant from the National Institute of Health GM066380 and the NHLBI UT-Southwestern Center for Proteomics Research Contract (NO1-HV-28185).
References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as: of special interest of outstanding interest 1.
Kodadek T: From carpet bombing to cruise missiles: ‘secondorder’ mechanisms of transcription factor-DNA binding in vivo. Chem Biol 1995, 2:267-279.
2.
Jenuwein T, Allis CD: Translating the histone code. Science 2001, 293:1074-1080.
www.sciencedirect.com
3.
Walter J, Dever CA, Biggin MD: Two homeo domain proteins with similar specificity bind to a wide range of DNA sites in Drosophila embryos. Genes Dev 1994, 8:1678-1692.
4.
Kissinger CR, Liu B, Martin-Blanco E, Kornberg TB, Pabo CO: Crystal structure of an engrailed homeodomain-DNA complex at 2.8 A˚ resolution: a framework for understanding homeodomain-DNA interactions. Cell 1990, 63:579-590.
5.
Vashee S, Xu H, Johnston SA, Kodadek T: How do Zn2Cys6 proteins distinguish between similar upstream activation sites? Comparison of the in vivo and in vitro DNA-binding specificities of the GAL4 protein. J Biol Chem 1993, 268:24699-24706.
6.
Vashee S, Kodadek T: The activation domain of GAL4 protein mediates cooperative promoter binding with general transcription factors in vivo. Proc Natl Acad Sci USA 1995, 92:10683-10687.
7.
Li Q, Johnston SA: Are all DNA binding and transcription regulation by an activator physiologically relevant? Mol Cell Biol 2001, 21:2467-2474.
8.
Kurdistani SK, Grunstein M: In vivo protein–protein and protein– DNA crosslinking for genomewide binding microarray. Methods 2003, 31:90-95.
9.
Horak CE, Snyder M: ChIP-chip: a genomic approach for identifying transcription factor binding sites. Methods Enzymol 2002, 350:469-483.
10. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E et al.: Genome-wide location and function of DNA-binding proteins. Science 2000, 290:2306-2309. 11. Xiang CC, Kozhich OA, Chen M, Inman JM, Phan QN, Chen Y, Brownstein MJ: Amine-modified random primers to label probes for DNA microarrays. Nat Biotechnol 2002, 20:738-742. 12. Brown PO, Botsein D: Observing the living genome. Nat Genet 1999, 21:33-37. 13. Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SP: Accessing genetic information with high-density DNA arrays. Science 1996, 274:610-614. 14. Rogojina AT, Orr WE, Song BK, Geisert EE: Comparing the use of Affymetrix to spotted oligonucleotide microarrays using two retinal pigment epithelium cell lines. Mol Vis 2003, 9:482-496. 15. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al.: Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004, 116:499-509. The study reveals the functional attributes of non-coding DNA. The results suggest that transcripts from these non-coding regions could be regulated by transcription factor association and external stimuli. 16. Iyer V, Horak CE, Scafe CS, Botstein D, Snyder M, Brown PO: Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 2001, 409:533-538. 17. Horak CE, Luscombe NM, Qian J, Bertone P, Piccirrillo S, Gerstein M, Snyder M: Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae. Genes Dev 2002, 16:3017-3033. 18. Robyr D, Suka Y, Xenarios I, Kurdistani SK, Wang A, Suka N, Grunstein M: Microarray deacetylation maps determine genome-wide functions for yeast histone deacetylases. Cell 2002, 109:437-446. By mapping histone hyperacetylation pattern in mutants defective in HDAC activity, the authors map the global sites recruiting for histone deacetylase. 19. Wyrick JJ, Holstege FC, Jennings EG, Causton HC, Shore D, Grunstein M, Lander ES, Young RA: Chromosomal landscape of nucleosome-dependent gene expression and silencing in yeast. Nature 1999, 402:418-421. 20. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002, 298:799-804. Current Opinion in Chemical Biology 2005, 9:38–45
44 Proteomics and genomics
Using 106 tagged transcription factors, this team mapped all the promoters that recruit them. Their work suggest 4000–35 000 regulatory interactions between the various members of the gene network. 21. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001, 292:929-934.
36. Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H: Analysis of the mouse transcriptome based on functional annotation of 60,770 fulllength cDNAs. Nature 2002, 420:563-573. 37. Carrington JC, Ambros V: Role of microRNAs in plant and animal development. Science 2003, 301:336-338.
22. Horak CE, Mahajan MC, Luscombe NM, Gerstein M, Weissman SM, Snyder M: GATA-1 binding sites mapped in the beta-globin locus by using mammalian ChIp-chip analysis. Proc Natl Acad Sci USA 2002, 99:2924-2929.
38. Sutherland HF, Wadey R, McKie JM, Taylor C, Atif U, Johnstone KA, Halford S, Kim UJ, Goodship J, Baldini A et al.: Identification of a novel transcript disrupted by a balanced translocation associated with DiGeorge syndrome. Am J Hum Genet 1996, 59:23-31.
23. Wells J, Yan PS, Cechvala M, Huang T, Farnham PJ: Identification of novel pRb binding sites using CpG microarrays suggests that E2F recruits pRb to specific genomic sites during S phase. Oncogene 2003, 22:1445-1460.
39. Bussemakers M, van Bokhoven A, Verhaegh G, Smit F, Karthaus H, Schalken J, Debruyne F, Ru N, Isaacs W: DD3: a new prostatespecific gene, highly overexpressed in prostate cancer. Cancer Res 1999, 59:5975-5979.
24. Weinmann AS, Yan PS, Oberley MJ, Huang TH, Farnham PJ: Isolating human transcription factor targets by coupling chromatin immunoprecipitation and CpG island microarray analysis. Genes Dev 2002, 16:235-244.
40. Wolf S, Mertens D, Schaffner C, Korz C, Dohner H, Stilgenbauer S, Lichter P: B-cell neoplasia associated gene with multiple splicing (BCMS): the candidate B-CLL gene on 13q14 comprises more than 560 kb covering all critical regions. Hum Mol Genet 2001, 10:1275-1285.
25. Li Z, Van Calcar S, Qu C, Cavenee WK, Zhang MQ, Ren B: A global transcriptional regulatory role for c-Myc in Burkitt’s lymphoma cells. Proc Natl Acad Sci USA 2003, 100:8164-8169. 26. Odom DT, Zizlsperger N, Gordon DB, Bell GW, Rinaldi NJ, Murray HL, Volkert TL, Schreiber J, Rolfe PA, Gifford DK et al.: Control of pancreas and liver gene expression by HNF transcription factors. Science 2004, 303:1378-1381. The only available study done in higher eukaryotes aimed at identifying transcriptional regulatory networks. The authors generated association maps for HNF1a, HNF4a and HNF6 in hepatocytes and pancreatic islets and identified genes regulated by these transcription factors by probing the occupancy of RNA polymerase II. 27. Kirmizis A, Bartley SM, Kuzmichev A, Margueron R, Reinberg D, Green R, Farnham PJ: Silencing of human polycomb target genes is associated with methylation of histone H3 Lys 27. Genes Dev 2004, 18:1592-1605. 28. Antequera F, Bird A: Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci USA 1993, 90:11995-11999. 29. Wells J, Graveel CR, Bartley SM, Madore SJ, Farnham PJ: The identification of E2F1-specific target genes. Proc Natl Acad Sci USA 2002, 99:3890-3895. 30. Kondo Y, Shen L, Yan P, Huang T, Issa JP: Chromatin immunoprecipitation microarrays for identification of genes silenced by histone H3 lysine 9 methylation. Proc Natl Acad Sci USA 2004, 101:7398-7403. 31. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M, Weissman S et al.: Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci USA 2003, 100:12247-12252. The theme that emerged from this paper was that the binding sites of P65 were not restricted to promoters but prevalent in intronic and coding regions as well. This suggested that recruitment of the transcription factor alone is not sufficient for transcriptional activation. Another finding related to the broad sequence specificity of NF-kB as 50% of the mapped sites corresponded to non-consensus sites. 32. Weinmann AS: Novel ChIP-based strategies to uncover transcription factor target genes in the immune system. Nat Rev Immunol 2004, 4:381-386. 33. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al.: Initial sequencing and analysis of the human genome. Nature 2001, 409:860-921. 34. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P: Initial sequencing and comparative analysis of the mouse genome. Nature 2002, 420:520-562. 35. Duret L, Dorkeld F, Gautier C: Strong conservation of noncoding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. Nucleic Acids Res 1993, 21:2315-2322. Current Opinion in Chemical Biology 2005, 9:38–45
41. Vulliamy T, Marrone A, Goldman F, Dearlove A, Bessler M, Mason PJ, Dokal I: The RNA component of telomerase is mutated in autosomal dominant dyskeratosis congenita. Nature 2001, 413:432-435. 42. Bernstein BE, Humphrey EL, Erlich RL, Schneider R, Bouman P, Liu JS, Kouzarides T, Schreiber SL: Methylation of histone H3 Lys 4 in coding regions of active genes. Proc Natl Acad Sci USA 2002, 99:8695-8700. 43. Grunstein M: Yeast heterochromatin: regulation of its assembly and inheritance by histones. Cell 1998, 93:325-328. 44. Sherman JM, Pillus L: An uncertain silence. Trends Genet 1997, 13:308-313. 45. Panning B, Jaenisch R: RNA and the epigenetic regulation of X chromosome inactivation. Cell 1998, 93:305-308. 46. Jaenisch R: DNA methylation and imprinting: why bother? Trends Genet 1997, 13:323-329. 47. Bernstein BE, Tong JK, Schreiber SL: Genomewide studies of histone deacetylase function in yeast. Proc Natl Acad Sci USA 2000, 97:13708-13713. 48. Wu J, Suka N, Carlson M, Grunstein M: TUP1 utilizes histone H3/ H2B-specific HDA1 deacetylase to repress gene activity in yeast. Mol Cell 2001:117-126. 49. Schubeler D, MacAlpine DM, Scalzo D, Wirbelauer C, Kooperberg C, van Leeuwen F, Gottschling DE, O’Neill LP, Turner BM, Delrow J et al.: The histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote. Genes Dev 2004, 18:1263-1271. This study shows that hyperacetylation of H3 and H4 and hypermethylation of H3 are modifications that are associated with transcriptional activity of the genes. 50. Roh T-y, Ngau WC, Cui K, Landsman D, Zhao K: High-resolution genome-wide mapping of histone modifications. Nat Biotechnol 2004, 22:1013-1016. 51. Singh-Gasson S, Green RD, Yue Y, Nelson C, Blattner F, Sussman MR, Cerrina F: Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array. Nat Biotechnol 1999, 17:974-978. 52. Luebke KJ, Balog RP, Mittelman D, Garner HR: Digital optical chemistry. In Microfabricated Sensors: Application of Optical Technology for DNA Analysis. Edited by Kordal R, Usmani AM, Law WT: American Chemical Society; 2002 87–106. 53. Chambers RS, Johnston SA: High-level generation of polyclonal antibodies by genetic immunization. Nat Biotechnol 2003, 21:1088-1092. 54. Hockensmith JW, Kubasek WL, Vorachek WR, Evertsz EM, von Hippel PH: Laser cross-linking of protein-nucleic acid complexes. Methods Enzymol 1991, 208:211-236. 55. Zeitlinger J, Simon I, Harbison CT, Hannett NM, Volkert TL, Fink GR, Young RA: Program-specific distribution of a www.sciencedirect.com
Genomic studies of transcription factor–DNA interactions Sikder and Kodadek 45
transcription factor dependent on partner transcription factor and MAPK signaling. Cell 2003, 113:395-404. 56. Moqtaderi Z, Struhl K: Genome-wide occupancy profile of the RNA polymerase III machinery in Saccharomyces cerevisiae reveals loci with incomplete transcription complexes. Mol Cell Biol 2004, 24:4118-4127.
61. Wyrick JJ, Aparicio JG, Chen T, Barnett JD, Jennings EG, Young RA, Bell SP, Aparicio OM: Genome-wide distribution of ORC and MCM proteins in S. cerevisiae: high-resolution mapping of replication origins. Science 2001, 294:2357-2360. 62. Robyr D, Grunstein M: Genomewide histone acetylation microarrays. Methods 2003, 31:83-89.
57. Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M, Weissman S et al.: CREB binds to multiple loci on human chromosome 22. Mol Cell Biol 2004, 24:3804-3814.
63. Bernstein BE, Humphrey EL, Erlich RL, Schneider R, Bouman P, Liu JS, Kouzarides T, Schreiber SL: Methylation of histone H3 Lys 4 in coding regions of active genes. Proc Natl Acad Sci USA 2002, 99:8695-8700.
58. Ren B, Cam H, Takahashi Y, Volkert T, Terragni J, Young RA, Dynlacht BD: E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints. Genes Dev 2002, 16:245-256.
64. Schubeler D, MacAlpine DM, Scalzo D, Wirbelauer C, Kooperberg C, van Leeuwen F, Gottschling DE, O’Neill LP, Turner BM, Delrow J, Bell SP et al.: The histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote. Genes Dev 2004, 18:1263-1271.
59. Odom DT, Zizlsperger N, Gordon DB, Bell GW, Rinaldi NJ, Murray HL, Volkert TL, Schreiber J, Rolfe PA, Gifford DK, Fraenkel E et al.: Control of pancreas and liver gene expression by HNF transcription factors. Science 2004, 303:1378-1381.
65. Lieb JD, Liu X, Botstein D, Brown PO: Promoter-specific binding of Rap1 revealed by genome-wide maps of protein-DNA association. Nat Genet 2001, 28:327-334.
60. Ng HH, Robert F, Young RA, Struhl K: Genome-wide location and regulated recruitment of the RSC nucleosomeremodeling complex. Genes Dev 2002, 16:806-819.
66. Hahn JS, Hu Z, Thiele DJ, Iyer VR: Genome-wide analysis of the biology of stress responses through heat shock transcription factor. Mol Cell Biol 2004, 24:5249-5256.
www.sciencedirect.com
Current Opinion in Chemical Biology 2005, 9:38–45