ChIP-chip: A genomic approach for identifying transcription factor binding sites

ChIP-chip: A genomic approach for identifying transcription factor binding sites

[26] IDENTIFYING TRANSCRIPTION FACTOR BINDING SITES 469 useful to determine that a phenotype is present in more than one independent transformant, ...

873KB Sizes 0 Downloads 53 Views

[26]

IDENTIFYING TRANSCRIPTION FACTOR BINDING SITES

469

useful to determine that a phenotype is present in more than one independent transformant, and, finally, if the mutant phenotype is recessive, to demonstrate that any phenotypes ascribed to the mutation are rescued by the wild-type gene on a plasmid.

[261 ChiP-chip: A Genomic Approach for Identifying Transcription Factor Binding Sites B y CHRISTINE E. HORAK and MICHAEL SNYDER

Introduction Transcription factors are key regulatory proteins that can influence the expression o f hundreds of genes in response to a particular environmental condition or internal cue. The collective set o f genes regulated by a transcription factor (or gene targets) therefore defines the state of a cell and can determine cell fate. A m o n g the 6200 predicted proteins in the yeast S a c c h a r o m y c e s c e r e v i s i a e , there are about 300 transcription factors. 1 Approximately 85% of the yeast transcription factors have been characterized to some extent and some of these are known to play a critical role in cell cycle initiation, pheromone response, mating type switching, pseudohyphal growth, and nutrient and stress response. 1 For most yeast transcription factors no targets are known and for those factors that have been studied, usually 10 targets or fewer have been found. Thus, identifying the gene targets of transcription factors is an important problem, not only for defining the function of a particular factor, but for characterizing the molecular response of a cell. Efforts have been made to comprehensively identify transcription factor targets by examining transcript profiles in the absence of a particular transcription factor or in a gain-of-function mutant using microarray analysis. 2-8 Although this approach 1M. C. Costanzo, J. D. Hogan, M. E. Cusick, B. P. Davis, A. M. Fancher, P. E. Hodges, P. Kondu, C. Lengieza, J. E. Lew-Smith, C. Lingner, K. J. Roberg-Perez, M. Tillberg, J. E. Brooks. and J. I. Garrels, Nucleic Acids Res. 28, 73 (2000). 2 j. L. DeRisi, V. R. Iyer, and P. O. Brown, Science 278, 680 (1997). 3 p. Sudarsanam,V. R. Iyer, P. O. Brown, andE Winston,Proc. Natl. Acad. Sci. U.S.A. 97, 3364 (2000). 4 j. DeRisi, B. van den Hazel, P. Marc, E. Balzi, P. O. Brown, C. Jacq, and A. Goffeau, FEBS Lett. 470, 156 (2000). 5 C. W. Yun, T. Ferea, J. Rashford, O. Ardon, P. O. Brown, D. Botstein, J. Kaplan, and C. C. Philpott, J. Biol. Chem. 275, 10709 (2000). 6 T. J. Lyons, A. P. Gasch, L. A. Gaither, D. Botstein, P. O. Brown, and D. J. Eide, Proc. Natl. Acad. Sci. U.S.A. 97, 7957 (2000). 7 C. Gross, M. Kelleher, V. R. Iyer, P. O. Brown, and D. R. Winge, J. Biol. Chem. 275, 32310 (2000). 8 S. Chu, J. DeRisi, M. Eisen, J. Mulholland, D. Botstein, E O. Brown, and I. Herskowitz, Science 282, 699 (1998).

METHODS IN ENZYMOLOGY, VOL. 350

Copyright 2002, Elsevier Science (USAL All rights reserved. 0076-6879/02 $35.00

470

GENOMICS

[261

is comprehensive as it surveys the whole yeast genome, it is not direct. Alterations in transcript levels may result from secondary effects due to the mutation itself or through other downstream regulatory events. Also, some of the gene targets may not be identified because they do not exhibit changes in expression as a result of compensatory mechanisms or redundant transcription factors that exist within the cell. Direct methods for target identification that examine transcription factor binding to promoter sequences upstream of a gene, such as in vivo footprinting9 or PCR (polymerase chain reaction) analysis of DNA coimmunoprecipitated with a transcription factor, 10,11are not comprehensive. Typically, only a handful of promoters are examined at a time with these methods. We have developed an approach in yeast that will comprehensively identify genomic sequences directly bound by transcription factors. This approach has been used to successfully identify targets of the yeast transcription factors SBF (Swi4-Swi6 cell cycle box binding factor) and MBF (Mlul cell cycle box binding factor) in Iyer et al. 12 and Gal4 and Stel2 in Ren et al, 13 We have termed this approach ChiP-chip because it involves chromatin immunoprecipitation of the factor of interest and its associated DNA and DNA chip (microarray) probing with the immunoselected DNA (Fig. 1). Briefly, protein-DNA complexes are fixed in vivo with formaldehyde and lysed, and the lysate is sonicated to shear DNA. The transcription factor of interest is purified by immunoprecipitation; the associated DNA is extracted and then amplified and labeled for hybridization to a yeast intergenic array. Two important reagents for this method are specific immunoprecipitating antibodies and a yeast intergenic microarray. Antibodies and Tagged Transcription Factors The quality of the antibody used for ChiP-chip experiments is important because of the sensitivity of microarray analysis. Nonspecific cross-reactions with chromatin will result in a high level of background hybridization to the microarray. Antibodies should be tested for their ability to specifically immunoprecipitate the transcription factor of interest before being used for chromatin immunoprecipitation. Very few antibodies against yeast transcription factors are available, but the ChiP-chip approach can be generalized for all yeast transcription factors by using proteins that have been tagged with hemagglutinin, myc, or any other epitope for which there are high quality antibodies available. The proteins can be randomly 9 j. D. Axelrod and J. Majors, Nucleic Acids Res. 17, 171 (1989). 10 V. Orlando, H. Strutt, and R. Paro, Methods 11, 205 (1997). n M. H. Kuo and C. D. Allis, Methods 19, 425 (1999). 12 V. R. Iyer, C. E. Horak, C. S. Scafe, D. Botstein, M. Snyder, and P. O. Brown, Nature 409, 533 (2001). 13 B. Ren, F. Robert, J. J. Wyrick, O. Aparicio, E. G. Jennings, I. Simon, J. Zeitlinger, J. Schreiber, N. Hannett, E. Kanin, T. L. Volkert, C. J. Wilson, S. P. Bell, and R. A. Young, Science 290, 2306

(2000).

[26]

IDENTIFYING TRANSCRIPTION FACTOR BINDING SITES

Epitope tagged strain I Crosslink protein to DNA

471

Untagged strain

1

Harvest and lyes cells Sonicata to fragment DNA

.-.~"~ ~ " ' ' ~ "

1 l,°°u°o,-,-I

,l Hybridize to

rnlcroarray

Key:

~

DNAfragment

I NN I o c, ,'~ Proteins

Ib EpltopeTBg ,~,~Antlbocly

FIG. 1. Schematic drawing of the ChiP-chip technique for epitope-tagged DNA-binding proteins.

tagged using the transposon tagging method described in Ross-MacDonald et al. 14 or selectively tagged by the homologous recombination method described by Schneider et al. 15 These tagged transcription factors can be immunoprecipitated with commercial antibodies raised against the epitope. Yeast Intergenic Microarray Another key component of the ChiP-chip technique is the yeast intergenic microarray. Intergenic regions are the regions between open reading frames (ORFs) or other non-ORF features (rRNA coding sequences, tRNA coding sequences, Ty t4 p. Ross-MacDonald, A. Sheehan, G. S. Roeder, and M. Snyder, Proc. Natl. Acad. Sci. U.S.A. 94, 190 (1997). 15 B. L. Schneider, W. Seufert, B. Steiner, Q. H. Yang, and A. B. Futcher, Yeast 11, 1265 (1995).

472

GENOMICS

[26]

iCEN1

i 1

--5'

iYARCdelta5 ,'

~ (UC-W~)A

Y~

itA(UGC)A

p

n5+

'1 i

f

IYAR014c "---"lOObp

FIG. 2. Examples of intergenic regions on S. cerevisiae Chromosome I. Intergenic regions are segments of DNA between ORFs or non-ORF features (such as Ty insertions, centromedc and telomeric sequences, and rRNA and tRNA coding sequences). These sequences are given the name of the SGD designated ORF or non-ORF feature to its left prefixed by the letter "i." The intergenic region iCEN I is to the left of the centromere CEN1. Intergenic region iYARCdelta5 is to the left of the long tandom repeat of the Ty insertion YARCdelta5. Intergenic region itA(UGC)A is to the left of the tRNA coding sequence for tA(UGC)A, and iYAR014c is to the left of the YAR014c ORF.

insertions and centromeric elements) (Fig. 2). Intergenic regions are designated by prefixing the SGD (Saccharomyces Genome Database) designated ORF name or non-ORF feature to its left with an "i." There are approximately 6700 intergenic elements in yeast that range in size from 50 to 5000 bp. Since the intergenic regions generally contain the gene promoter sequences and transcription factor binding sites, use of this type of microarray will provide the most complete survey of transcription factor-associated DNA. However, depending on the extent of DNA sheafing, there may be enough sequence overlap between the immunoprecipitated DNA fragments and the affixed ORF probes to allow a traditional ORF microarray to be used for hybridization and target identification./2 The protocols herein describe a method for yeast transcription factor target identification using the yeast intergenic array.

[26]

IDENTIFYING TRANSCRIPTION FACTOR BINDING SITES

473

Protocols

Constructing DNA Microarray of Yeast lntergenic Regions Constructing microarrays requires an arrayer in a room or chamber that is temperature and humidity controlled. Arrayers are available through several robotics companies, such as Engineering Services Inc. (ESI, Toronto, Canada) or Genemachines (Genemachines, San Carlos, CA), or can be built using the MGuide accessed through the http://cmgm.stanford.edu/pbrown Web site. This Web site also provides links to other general microarray protocols. Further information regarding microarray experiments is provided elsewhere in this volumeJ 5a The yeast intergenic regions can be PCR amplified with primers available through Research Genetics (Huntsville, AL). Regions that are larger than 1500 bp are subdivided into two or more fragments for amplification purposes. The intergenic regions are amplified and arrayed onto slides as described below. These methods are based on previously developed procedures. 2 1. The forward and reverse primers for PCR amplification from Research Genetics are premixed into 65 96-well plates. Maintain the reactions in the same 96-well arrangement. For a 50-#1 reaction, the PCR conditions are as follows: 25 #1 2× Mastermix (Qiagen, Valencia, CA), 4 #1 primer mix, 2 ~1 yeast genomic DNA (100 ng/Izl), and 19 ~1 water. The thermal cycling conditions are as follows: 94 ° for 5 min, 40 cycles of 92 ° for 10 sec, 55 ° for 30 sec, 72 ° for 2 min, and a final extension step at 72 ° for 7 min. PCR products are subsequently analyzed by gel electrophoresis on a 1.5% agarose gel (see below). 2. Precipitate the amplification products with 100 #1 100% ethanol and centrifugation at 3000 rpm for 1 hr at 4 °. Decant the supernatant, dry the pellets, and resuspend them in 100 #1 water. Plates of PCR products can be stored indefinitely at - 7 0 ° . 3. Transfer DNA from the 65 96-well plates to 19 384-well plates (Whatman Polyfiltronics, Clifton, N J) for arraying purposes. Using a Tecan robot, aliquot 4/~1 of DMSO (dimethyl sulfoxide) into each well of the 384-well plates and then add 4 #1 of each amplified intergenic region. 4. Prepare a control plate, which includes repeated 4 ~1 aliquots of water (no DNA), heterologous DNA (for example, a PCR-amplified region of human /~-globin gene) as a control for hybridization specificity, and sonicated total yeast genomic DNA as a positive control. Each control element is mixed with 4 /~1 DMSO in a 384-well plate. 5. Spot DNA using an arrayer and microspotting pins onto slides. Pins are available from Array-It (TeleChem International, Sunnyvale, CA). Slides coated with y-aminopropylsilane from Coming Microarray Technology (CMT, Coming, NY), are recommended. While spotting, the temperature and humidity should be maintained at 25 ° and 50%, respectively. 15a A. P. Gasch, Methods Enzymol. 350, [23], 2002 (this volume).

474

GENOMICS

[261

6. Cross-link DNA to the slide surface using a Stratalinker UV cross-linker (Stratagene, La Jolla, CA) at 200 mJ. Store the slides at room temperature in a dry place. Troubleshooting for PCR of lntergenic Regions. Twenty percent of each PCR product should be analyzed by agarose gel electrophoresis. PCR products should be scored on the basis of their size and extent of amplification. Amplification product sizes are available through the Research Genetics ftp address (ftp://ftp.resgen.com). Incorrectly sized fragments should be noted. Failed and faint PCR reaction products should be repeated using a lower PCR annealing temperature (50°). Correctly sized products from this second round of PCR should be placed in their original position in the 96-well plate after ethanol precipitation. Alternative Procedures for Microarraying. Polylysine-coated slides can also be used for microarraying, but we have found that the polylysine coat (and adhered spots of DNA) may tear away from the glass surface of the slide rendering the microarray useless. Other methods also involve microspotting with DNA resuspended in 3x SSC, but we have found that DMSO helps prevent evaporation of the solution from the source plate and from the pins during the arraying process and results in more DNA adhering to the slide. Immunoprecipitation of DNA Associated with Transcription Factor

Protein-DNA interactions can be reversibly cross-linked with formaldehyde, 16'17 and these in vivo chromatin complexes can be isolated by immunoprecipitation. 18'19 A schematic of this chromatin immunoprecipitation procedure is shown in Fig. 1. DNA is immunoprecipitated from strains carrying a tagged version of a transcription factor, and because the antibody may cross-react with other components of chromatin, DNA should also be purified from an isogenic untagged strain. If an antibody against the native protein is used, perform a mock immunoprecipitation from a strain lacking the gene for that transcription factor. The following protocol is adapted from methods described by Aparicio, 2° Hecht and Grunstein,18 Kuo and Allis,11 and Orlando et aL lo Steps of this procedure may need to be optimized for particular proteins. 1. Grow a 50 ml yeast culture to mid-log phase (OD600 of 0.5). The proteinDNA complexes are fixed in vivo by adding the yeast culture to 1.4 ml 37% (v/v) 16M. J. Solomonand A. Varshavsky,Proc. Natl. Acad. Sci. U.S.A. 82, 6470 (1985). 17M. J. Solomon,P. L. Larsen,and A. Varshavsky,Cell53, 937 (1988). 18A. Hechtand M. Grunstein,Methods Enzymol. 304, 399 (1999). 19S. Strahl-Bolsinger,A. Hecht,K. Luo,and M. Grunstein,GenesDev. 11, 83 (1997). 20O. M. Aparicio, in "CurrentProtocolsin MolecularBiology"(F. M. Ausubel, R. Brent, R. E. Kingston,D. D. Moore,J. G. Seidman,J. A. Smith,and K. Struhl,eds.), p. 21.3.1. JohnWileyand Sons, New York, 1999.

[26]

IDENTIFYING TRANSCRIPTIONFACTOR BINDINGSITES

475

formaldehyde solution in a 50-ml conical tube to a final concentration of 1% (v/v) formaldehyde. Fix the cells for 15 min at room temperature, occasionally inverting the tube. 2. Cross-links are quenched by adding glycine to the fixed culture at a final concentration of 125 m M and incubating at room temperature for 5 min with occasional inversion. 3. Harvest cells by centrifugation at 3000 rpm for 5 min at room temperature. Discard the supernatant and wash the cells with 10 ml ice-cold 1 x TBS (150 mM NaC1, 20 mM Tris-HC1, pH 7.6). Spin down the cells again at 3000 rpm for 5 min at 4 °, resuspend them in 1 ml ice-cold ix TBS, and transfer them to a 2 ml microcentrifuge tube (the 2 ml microcentrifuge tube allows for maximum bead movement while vortexing). Centrifuge cells at 14,000 rpm for 1 min at 4 ° and then discard the supernatant. Cells can be kept on ice for several hours at this point. Alternatively, cell pellets can be frozen in liquid nitrogen and stored at - 7 0 ° . 4. Resuspend the cell pellet in 400 /xl ice-cold lysis buffer [0.1% (w/v) deoxycholic acid, 1 mM EDTA, 50 mM HEPES-KOH, pH 7.5, 140 mM NaC1, 1% (v/v) Triton X-100] with protease inhibitors. Add an equal volume of cubic zirconium beads and vortex the mixture at maximum force for 10 min at 4 °. Let the samples sit on ice for 15 min before transferring the lysate to a fresh 2 ml microcentrifuge tube. Wash the cubic zirconium beads with 400 #1 ice-cold lysis buffer with protease inhibitors and vortex for 1 min. Add the wash to the lysate. 5. Shear chromatin using a Branson 250 Sonifier with a microtip at a power setting of 1.5 and a 100% duty cycle. Sonicate extracts for three 10 sec pulses. In between 10 sec pulses, let samples sit on ice for at least 2 min. This should shear chromatin to a final average size of 500 bp. Sonicators can be calibrated to yield the desired DNA length (see below). 6. Clarify the lysate by centrifugation at 14,000 rpm for 15 min at 4 °. Transfer the supernatant to a fresh 1.5 ml microcentrifuge tube. 7. Preclear the extracts to remove fragments of DNA that may adhere to the protein A-Sepharose beads in the immunoprecipitation step. Incubate the extracts with a bed volume of 30 #1 protein A-Sepharose beads (Pierce, Rockford, IL) on a rotation wheel for 1 hr at 4 °. Centrifuge at 7000 rpm for 2 min and transfer the lysate to a fresh tube. 8. To immunoprecipitate the transcription factor, the primary antibody against the protein of interest or against the epitope tag is added to the lysate (see below for determining antibody amounts). Incubate lysate with the antibody on ice for 3 hr and then add 30 #1 bed volume of protein A-Sepharose beads. Incubate on a rotating wheel for 1 hr at 4 °. Centrifuge samples for 2 min at 7000 rpm at 4 °. 9. Wash the immunoprecipitates four times for 5 min each on a rotation wheel at 4 °. Subsequent to each wash, the beads are collected by centrifugation at 7000 rpm for 2 min at 4 ° and the wash buffer is discarded. The first wash is with 1 ml lysis buffer. The second wash is with 1 ml lysis buffer-500 (0.1% deoxycholic

476

GENOMICS

[26]

acid, 1 mM EDTA, 50 mM HEPES-KOH, pH 7.5, 500 mM NaC1, 1% Triton X-100). The beads are then washed with 1 mi LiC1/detergent solution [0.5% (w/v) deoxycolic acid, 1 mM EDTA, 250 mM LiC1, 0.5% (v/v) Nonidet P-40 (NP-40), 10 mM Tris-HC1, pH8] and finally with 1 mi lx TBS. 10. Elute the immunocomplexes from the sepharose beads by adding 100/zl 1% (w/v) SDS/lx TE and incubating at 65 ° for 10 min. Spin the beads down briefly and transfer the eluate to a fresh tube. Wash the beads with 150/zl 0.67% (w/v) SDS/lx TE and then add the wash to the eluate. 11. In order to reverse the cross-links, incubate the immunoprecipitates and the total extract material at 65 ° for at least 6 hr. It is often convenient to extend this step overnight. 12. To further remove protein from the DNA, treat with proteinase K. 16 Add 250/zl proteinase K solution (20 # g glycogen/100/~g proteinase K / l x TE) to each sample and incubate for 2 hr at 37 °. 13. DNA is purified by phenol-chloroform extraction and ethanol precipitation as described by Orlando and Paro. 21 Add 55/zl 4 M LiCI and 500/zl phenol/ chloroform/isoamyl alcohol [25 : 24 : 1 (v/v)] to each sample and then vortex vigorously for 1 min. Separate phases by centrifugation at 14,000 rpm for 10 min at room temperature. Transfer the aqueous phase to a fresh tube and add 1 mi 100% ethanol to precipitate the DNA. Mix the samples and centrifuge at 14,000 rpm for 20 min at 4 °. Discard the supernatant and dry the pellet. Resuspend the DNA in Ix TE and store at - 2 0 °. Troubleshooting. The above procedure should yield approximately 10 ng of immunopurified DNA. The DNA can be quantitated with a fluorimeter or with ethidium bromide spotting and assayed for enrichment of transcription factor bound sequences by a PCR assay 1°'11 or microarray hybridization (see below). If too little DNA is purified or the DNA is not enriched for a subset of sequences, several parameters of the chromatin immunoprecipitation procedure can be altered.

a. The extent of cross-linking can be adjusted by changing the concentration of formaldehyde, the time of incubation with the cross-linking agent, or the temperature of cross-linking. 22 The extent of cross-linking is critical and can depend on the sensitivity of the protein. Too much cross-linking may mask epitopes and too little cross-linking may lead to incomplete fixation. 22 b. The extent of sonication should also be monitored since chromatin fragments that are too large may not be readily immunoprecipitated. Set aside a 50-/zl aliquot of the lysate before immunoprecipitating for determining the average size of sheared DNA. Add 200 /zl 1% (w/v) sodium dodecyl sulfate (SDS)/I× TE 21V. Orlandoand R. Paro, Cell 75, 1187(1993). 22V. Orlando,TIBS 25, 99 (2000).

[9,6]

IDENTIFYING TRANSCRIPTIONFACTORBINDINGSITES

477

(10 mM Tris-HC1, pH 7.5, 1 mM EDTA) to this total cell lysate and then treat this sample in parallel with immunoprecipitates through steps 11-13. This total lysate DNA can then be analyzed by gel electrophoresis to determine the extent of shearing. A 2% agarose gel or 6% polyacrylamide gel is recommended for this analysis. If DNA fragments larger than 2 kb are observed, then increase the number of sonication pulses. c. The amount of antibody used for immunoprecipitation is another critical parameter. Preliminary immunoprecipitation experiments should be performed to determine the appropriate amount of antibody to be used for specific purification of the protein of interest. For the mouse monoclonal anti-hemagglutinin (HA) antibody 12CA5 (BabCo, Richmond, CA), a 1 : 200 dilution of the serum works well for immunoprecipitation of most proteins tagged with three copies of the hemagglutinin epitope. To ensure that the cross-linking is not rendering the protein refractory to immunoprecipitation, 10% of material eluted from the beads in step 10 can be analyzed by SDS-PAGE and Western blotting. The material should be boiled in sample buffer for 20 min before SDS-PAGE.

Labeling Probe DNA

Chromatin immunoprecipitation yields a very small amount of DNA (about 1-10 ng), so all of the precipitated DNA must be used in the labeling steps that follow. Because so little DNA is immunoprecipitated, a robust PCR amplification is required both before labeling and during the labeling step, which is different from previous methods of microarray probe labeling. DNA immunoprecipitated from the tagged and untagged strains are labeled with different fluorophores (Cy3, Cy5). Use similar amounts of DNA for each of these labeling reactions. Labeling involves three stages of PCR, random nanomers with fixed sequence linkers and fluorophore-conjugated dCTE The labeling procedure is adapted from that used in Iyer et al. 12 Other probe preparation protocols are available at http://cmgm. stanford.edu/pbrown. 1. Immunopurified DNA is brought to a final volume of 8 /zl in 1x TE and 1 /zl 10x T7 DNA polymerase buffer (USB:), and 1 #1 50 # M Primer 1 (5'-GTTTCCCAGTCACGATCNNNNNNNNN-3') is added to it. The DNA is denatured in a thermalcycler at 94 ° for 2 min and then cooled to 8 °, at which point 5/zl of reaction mix 1 [Ix T7 DNA polymerase buffer, 0.15 mM dNTPs, 0.015 M DTT, 0.75 /zg BSA, 4U T7 DNA polymerase (USB, Cleveland, OH)] is added to the tube. The reaction mixed is maintained at 8° for 2 rain and then ramped to 37 ° over 8 min. The DNA is allowed to polymerize for 8 min at 37 ° and then is denatured at 94 ° for 2 min. The reaction tubes are cooled to 8 ° and 1/zl of reaction mix 2 [0.4U T7 DNA polymerase in dilution buffer (USB)] is added. The reaction is held at 8° for 2 min, ramped to 37 °, and then held at 37 ° again for 8 min. The products are diluted with 35 #1 lx TE.

478

GENOMICS

[9,6]

2. The diluted products of the first stage of DNA synthesis are then used as a template in the second stage of PCR amplification. Each 100/zl reaction contains 15/zl of diluted product from step 1, 10/z110x PCR buffer (Qiagen), 2.5/z110 mM dNTPs, 2 /zl 25 mM MgC12, 2.5 /zl 50 /zM Primer 2 (5'-GTTTCCCAGTCA CGATC-3'), 1 /zl 5U//zl Taq polymerase (Qiagen), and 67 ~1 water. Thermal cycling conditions are as follows: 92 ° for 30 sec, 40 ° for 30 sec, 50 ° for 30 sec, 72 ° for 1 rain for 25 cycles. 3. A fraction of the PCR products from step 2 is used as template in the third stage of PCR amplification. For a 50 /zl reaction, the PCR conditions are as follows: 15/zl PCR product from step 2, 5/zl 10x PCR buffer (Qiagen), 0.5/zl F-33× dNTP stock solution, 1 /zl 50 #M Primer 2, 1 /zl 25 mM MgC12, 0.5 #1 5 U//zl Taq polymerase (Qiagen), and 24 /xl water. The F-33x dNTP stock solution contains 3.3/zl 100 mM dATP, 3.3, #1 100 mM dGTP, 3.3/zl 100 mM dTTP, 1.7/zl 100 mM dCTP, and 28.4/zl Ix TE. After these reaction components are mixed, add 1/zl of Cy3- or Cy5-conjugated dCTP (Amersham, Piscataway, NJ) to the 50 /zl reaction. Thermal cycling conditions are identical to those in step 2. 4. The labeled DNA should be purified immediately following the labeling reaction. Cy3- and Cy5-1abeled DNA probes are combined and purified using Qiagen PCR purification system according to the manufacturer's instructions. The DNA is eluted from the spin column with 25/zl elution buffer and incubated at room temperature for 1 rain before the final centrifugation step. The eluted probe mixture should be visibly purple in color.

Troubleshooting. If the purified probe mix is colorless or only one of the Cy dyes (pink or blue) is visible, the Cy-conjugated nucleotide probably has to be replaced. Some batches of Cy dye do not incorporate well. When repeating step 3 above, use the same template generated by PCR step 2, but a different vial of Cy-dCTP with a different batch number. If problems persist, a protocol involving aminoaUyl coupling can be followed in lieu of steps 3 and 4, which has been adapted from a protocol available through http://cmgm.stanford.edu. 3. All PCR reaction components are identical to those in step 3 above except in place ofF-33x dNTPs, 0.5/zl ofa 50x dNTP mix using a 4 : 1 ratio of aminoallyldUTP (Sigma, St. Louis, MO) to dTTP is used and no Cy-conjugated nucleotides are added. The 50x dNTP mix contains 10 #1 each of 100 mM dATP, dCTP, dGTP, 2 ~1 100 mM dTTP, and 8/zl 100 mM aminoallyl-dUTP. Aminoallyl-dUTP is dissolved in 0.04 1 NNaOH (1 mg/18 #1), so that the final pH is roughly 7. 4. All traces of Tris must be removed from the PCR products so that the free amine groups do not interfere with the coupling reaction below (step 5), so the products are purified and washed with neutralized water. (Note:The pH of the water must be approximately 7 in order for the DNA to adhere to the purification filter.) Dilute the PCR reactions with 450/zl water and then transfer the diluted products

[26]

IDENTIFYING TRANSCRIPTIONFACTORBINDINGSITES

479

to Microcon-30 filters (Amicon, Danvers, MA). Spin at 14,000 rpm for 7 min at room temperature and discard flow-through. Wash filters twice with 450/zl neutralized water and repeat centrifugation. Concentrate volume to 10/zl with one to two 1 min centrifugations at 14,000 rpm. Collect sample by inverting the filter tube into a fresh microfuge tube. Samples can be stored at - 2 0 ° indefinitely. 5. Before coupling the monofunctional NHS-ester Cy3 (Amersham) and Cy5 (Amersham) to the aminoallyl-incorporated DNA, add 0.5/zl 1 M sodium bicarbonate, pH 9. The Cy dyes are supplied as a dry pellet. Dissolve the pellet in 20 #1 DMSO and divide into ten 2-/zl aliquots that are then dried under vacuum and stored at 4 °. (Note: Do not store the dyes in water or DMSO as they are rapidly hydrolyzed in water.) Add the 10.5-/zl reaction to the desiccated dye pellet and incubate the coupling reaction for 1 hr at room temperature in the dark. Mix by tapping the reaction tube approximately every 15 min. 6. Quench the coupling with 4.5 #14 M hydroxylamine and incubate for 15 min in the dark. Combine the Cy3 and Cy5 reactions and purify with the Qiagen PCR purification kit according to manufacturer specifications. Elute the labeled DNA with 25/zl elution buffer and incubate at room temperature for 1 min before the final centrifugation step. Again, the probe mixture should be visibly purple in color. Another potential explanation for faint or colorless labeled DNA is that the concentration of template DNA used in the labeling step is too low. This problem may be corrected in one of three ways: first, try to immunoprecipitate more DNA by starting with a larger culture of yeast cells; second, increase the number of amplification cycles in steps 2 and 3 of the labeling procedure up to 28 cycles, which is still within the linear range; and third, purify and concentrate the reaction products of steps 1 or 2 of the labeling procedure for use in the steps that follow.

Hybridization The following hybridization protocol yields the best results for arrays printed on CMT-GAPS coated slides with DMSO (as described above). Other hybridization protocols are available for slides coated with polylysine or slides printed with 3x SSC through various Web sites (http://www.cmt.corning.com, http://sequence. aecom.yu.edu, http://www.microarrays.org). 1. The microarray slide is first prehybridized by adding 50/zl prehybridization buffer [25% (v/v) formamide, 5x SSC, 0.1% (w/v) SDS, 1% (w/v) bovine serum albumin (BSA)] to the printed side of the array and then covering with a coverslip that has been rinsed with water and then with ethanol and air-dried. The slide is placed into a HybChamber (Genemachines, San Carlos, CA) according to manufacturer's instructions and incubated at 42 ° for 1 hr. The slide is then removed, rinsed in water, and dried by centrifugation at 500 rpm for 5 min at room temperature.

480

GENOMICS

[26]

2. An equal volume of 2x hybridization buffer (50% formamide, 10x SSC, 0.2% SDS) is added to the purified probe; the hybridization mix is subsequently boiled for 5 min and then centrifuged for 2 rain at 14,000 rpm at room temperature. The probe should not be placed on ice. The hybridization mix is then pipetted onto the prehybridized array and covered with a coverslip that has again been rinsed with water and ethanol. The slide is incubated at 42 ° in a HybChamber for 12-16 hr. 3. When washing, separate wash containers should be used for slides hybridized with different probes and slides should not be allowed to dry between washes. The coverslip is first removed by immersing in 42 ° 2x SSC/0.1% SDS until the coverslip moves freely away from the slide. Slide is then washed in 2x SSC/0.1% SDS for 5 min at 42 ° and then in 0.1x SSC/0.1% SDS for 10 rain at room temperature. The array is washed 5 times in 0.1× SSC for 1 rain each and then rinsed in water and dried by centrifugation at 500 rpm for 5 rain at room temperature. The slide is then ready for scanning.

Troubleshooting. If, on scanning, the hybridization signals are generally weak (low fluorescent intensities), the signal may be increased by altering several parameters in the labeling procedure (see above). Nonspecific hybridization may be a concern if significantly high signal is detected for spots containing no DNA or heterologous DNA. The specificity may be increased by prehybridizing for longer periods of time. Alternatively, the hybridization temperature can be increased. Scanning and Analysis The slide is scanned to determine the Cy3 and Cy5 fluorescence intensity for each array element. The ratio of signal from the fiuorophore labeling the transcription factor-associated DNA (usually Cy5) to the signal from the fluorophore labeling the background DNA (usually Cy3) represents the extent of binding of the transcription factor to the specific genomic fragment. The background is mock immunoprecipitated DNA so that genomic fragments that are nonspecifically precipitated will not be deemed binding targets. Background subtracted data is normalized and scaled. Analysis of data from 2-3 replicate experiments should be performed to identify those intergenic regions that are consistently and significantly enriched by chromatin immunoprecipitation and are therefore candidate binding targets of the transcription factor of interest. The ratio quantities of individual intergenic regions can vary significantly between duplicate experiments; therefore the results determined are simply binary, fragments that are bound vs those that are not bound. The affinity of a transcription factor for its target DNA cannot be determined with the ChiP-chip assay, as the ratio quantities determined may be a function of sensitivity of certain region of chromatin to fixation or immunoprecipitation as well as promoter context. The analysis steps below are adapted from the methods used in Iyer et al. 12

[26]

IDENTIFYING TRANSCRIPTIONFACTORBINDINGSITES

481

1. Create a list of array elements and their corresponding address on the microarray from the list of intergenic regions in their 96-well format supplied by Research Genetics. The 96-well address (plate number, well designation) should be stepwise deconvoluted to the 384-well address (plate number, well designation) and then to the microarray address (block number, column number, row number). A deconvolution program, such as CloneTracker (BioDiscovery, Los Angeles, CA), can be used or a program can be created in Microsoft Excel for this purpose. Add an appropriate header to the array list and convert the list to a tab-limited text file format so that it can be read by the scanner software. 2. Open Genepix Pro version 3.0 software (Axon Instruments, Foster City, CA) and scan the slide at the appropriate photomultiplier tube (PMT) voltage for both the 532 nm (Cy3 excitation wavelength) and 635 nm (Cy5 excitation wavelength) lasers using a Genepix 4000A scanner (Axon Instruments). (Note: Cy3-dCTP generally incorporates less efficiently than Cy5-dCTP. Compensate for this effect by increasing the PMT voltage of the 532 nm laser during scanning.) The optimal PMT voltage for both channels is one at which a maximum number of array elements are visible, without saturation. White pixels represent saturation. 3. Load the array list generated in step 1. Align the blocks to the spot features on the scanned image. The Genepix software will define all of the array elements or intergenic regions. The software will also compute the fluorescence intensity of each array element in both channels and calculate the ratio of flourescence intensities for the red channel (Cy5) over the green channel (Cy3) when the Analyze button is pressed. The raw data and subsequent computations are sent to the Results tab spreadsheet. 4. Filter data by flagging spots with obvious defects and spots with total fluoresence intensities that is below an empirically determined threshold for both channels. 5. Genepix Pro 3.0 does not normalize the generated data, but it does compute a normalization factor for each of the ratio quantities. Once the analysis of the image has been completed, the results can be saved as a tab-delimited text file, reopened, and normalized in Microsoft Excel. To normalize, the ratio of medians (pixelby-pixel ratio) for each intergenic region is simply divided by the normalization factor given in the data header for the ratio of medians. Other programs are available for normalization of microarray data such as GeneSpring (Silicon Genetics, San Carlos, CA). 6. The data must also be scaled before they can be compared with duplicate ChiP-chip experiments or contrasted with ChiP-chip experiments using different conditions or for different transcription factors. Scaling can also be performed in Microsoft Excel by calculating the PERCENT RANK or percentile ranking (a statistical function) of the normalized ratio of medians for each intergenic region. When computing the PERCENT RANK, the array or range of data is zero to

482

GENOMICS

[26]

the maximum quantity within the normalized ratio of medians column for that experiment and "x" is the normalized ratio for each intergenic region. 7. In Microsoft Excel, combine the scaled data with the processed data from at least one other identical experiment to identify intergenic regions that are consistently enriched in immunoprecipitates with the transcription factor of interest. Find the average percentile rank for each intergenic region and then sort the average percentile ranks in ascending order. Use the histogram analysis tool to bin the ranks. Make 100 bins from 0.01 to 1.0 and then chart the distribution of the ranks using the column graph type option from the Chart Wizard. The distribution should be bimodal with candidate gene targets represented by the high percentile ranks that do not fall within the normal Gaussian distribution.

Troubleshooting. If the bimodal distribution is not observed, the problem may be corrected by adjusting the bins with the Histogram analysis tool. Some features of the distribution may have been obscured during the binning process. Create more bins with smaller intervals and look at the distribution of percentile ranks again. If restructuring the data fails, then add more data from a replicate experiment. If a bimodal distribution is still not observed after adding data, then examine the data to see if the observed enrichments are inconsistent. If the problem is inconsistency, it may be beneficial to see if the chromatin immunoprecipitation is working with a PCR assay (see above). If, however, consistent enrichments are observed, then it could be that the analysis is too rigorous. Candidate binding targets can then be identified on the basis of their consistent enrichment, especially if the potential binding targets flank genes that are functionally related or have similar transcriptional profiles. Conclusions and Future Directions ChiP-chip can be used to identify the genome-wide binding sites of transcription factors in yeast. When this technique is coupled to genome-wide expression analysis, the contribution of a particular transcription factor to the expression of each of its target genes can be determined. Target gene data can then be clustered by their functional class and by their expression profiles. Not only will these approaches describe the transcriptional role of known and unknown transcription factors, but because transcription factors frequently regulate the expression of other transcription factors, these techniques should advance description of transcriptional cascades, generating a temporal map of the transcriptional circuitry of the yeast cell. Identifying all the genomic binding targets of a transcription factor will also be useful for elucidating the consensus binding sequence for uncharacterized transcription factors and refining the consensus sequences for previously studied DNAbinding proteins. The genomic segments bound by a factor can be scanned for

[26]

IDENTIFYING TRANSCRIPTION FACTOR BINDING SITES

483

common sequence motifs. There are many programs available for this type of analysis, including Consensus, 23 MEME, 24 and Gibbs' sampling. 25'26 It will also be interesting to see how promoter sequence motifs relate to functionally distinct targets or target clusters with similar expression profiles. The ChiP-chip technique can be expanded to all DNA and chromatin interacting proteins. Using a microarray that includes all genomic elements, both ORFs and intergenic regions, the binding profiles of chromatin structural proteins, DNA silencing factors, and proteins involved in recombination can be examined. The technique can also be applied to proteins that do not directly interact with DNA. The chemical cross-linker formaldehyde fixes protein-protein interactions as well as protein-DNA interactions, so DNA indirectly associated with a protein can be immunopurified. The binding of signaling molecules important for the function or activation of a transcription factor can thus also be mapped on the genome. When the binding profiles are examined for all of these proteins under different environmental conditions and at different points in the cell cycle, we will gain a better understanding of transcriptional regulation and chromosome dynamics in yeast. The hope is, of course, to apply the ChiP-chip method to higher eukaryotes as their genomic sequences become available. Generating transcription factor binding profiles for higher eukaryotes will bring us closer to deciphering the complexities of gene regulation in these organisms. Acknowledgments Past and present members of the M. Snyder, P. Brown, and D, Botstein laboratories have contributed to the development of these protocols. We are especially grateful to V. R. Iyer and C. S. Scafe for pioneering work to develop the yeast intergenic microarray. Thanks also to J. Rinn for critical comments of this manuscript. C. Horak is supported by a predoctoral fellowship from the Howard Hughes Medical Institute.

23 G. D. Stormo and G. W. Hartzell, Proc. Natl. Acad. Sci. U.S.A. 86, 1183 (1989). 24 T. L. Bailey and C. Elkan, in "Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Menlo Park, California." American Association for Artificial Intelligence, 1994. 25 A. E Neuwald, J. S. Liu, and C. E. Lawrence, Protein Sci. 4, 1618 (1995). 26 C. E. Lawrence, S. F. Anschul, M. S. Bogouski, J. S. Liu, A. E Neuwald, and J. C. Wooten, Science 262, 208 (1993).