Modern tools for identification of nucleic acid-binding proteins

Modern tools for identification of nucleic acid-binding proteins

Available online at www.sciencedirect.com Biochimie 90 (2008) 1265e1272 www.elsevier.com/locate/biochi Review Modern tools for identification of nu...

643KB Sizes 0 Downloads 68 Views

Available online at www.sciencedirect.com

Biochimie 90 (2008) 1265e1272 www.elsevier.com/locate/biochi

Review

Modern tools for identification of nucleic acid-binding proteins Nadia He´garat a,b, Jean-Christophe Franc¸ois a,b,**, Danie`le Praseuth a,b,* b

a INSERM, U565 Case Postale 26, 57 rue Cuvier, 75231 Paris Cedex 05, France Muse´um National d’Histoire Naturelle (MNHN) USM 503, CNRS UMR 5153 ‘‘Acides Nucle´iques: dynamique, ciblage et fonctions biologiques’’, Case Postale 26, 57 rue Cuvier, 75231 Paris Cedex 05, France

Received 21 January 2008; accepted 21 March 2008 Available online 12 April 2008

Abstract Numerous biological mechanisms depend on nucleic acideprotein interactions. The first step to the understanding of these mechanisms is to identify interacting molecules. Knowing one partner, the identification of other associated molecular species can be carried out using affinitybased purification procedures. When the nucleic acid-binding protein is known, the nucleic acid can be isolated and identified by sensitive techniques such as polymerase chain reaction followed by DNA sequencing or hybridization on chips. The reverse identification procedure is less straightforward in part because interesting nucleic acid-binding proteins are generally of low abundance and there are no methods to amplify amino acid sequences. In this article, we will review the strategies that have been developed to identify nucleic acid-binding proteins. We will focus on methods permitting the identification of these proteins without a priori knowledge of protein candidates. Ó 2008 Elsevier Masson SAS. All rights reserved. Keywords: Nucleic acid-binding protein; Protein library; Affinity chromatography; Multiprotein complexes

1. Introduction Nucleic acideprotein interactions are involved in numerous cellular processes such as replication, transcription, splicing, and DNA repair. In order to better understand these biological processes and to modulate nucleic acideprotein interactions in pathological situations, the identities of both partners are needed. Different strategies have been developed to determine nucleic acid sequences that bind specifically to a known protein [1], including methods that allow detection in vivo: for example, chromatin immunoprecipitation on chip (ChIP on chip) and Dam identification (DamID). ChIP on chip identification involves immunoprecipitation of the protein of interest * Corresponding author. INSERM, U565 Case Postale 26, 57 rue Cuvier, 75231 Paris Cedex 05, France. Tel.: þ33 1 40 79 37 10; fax: þ33 1 40 79 37 05. ** Corresponding author. INSERM, U565 Case Postale 26, 57 rue Cuvier, 75231 Paris Cedex 05, France. Tel.: þ33 1 40 79 38 01; fax: þ33 1 40 79 37 05. E-mail addresses: [email protected] (J.-C. Franc¸ois), [email protected] (D. Praseuth). 0300-9084/$ - see front matter Ó 2008 Elsevier Masson SAS. All rights reserved. doi:10.1016/j.biochi.2008.03.012

crosslinked in vivo to its cognate nucleic acid sequences. Selected DNA or RNA regions are then recovered, amplified, and identified using a DNA chip. This approach was applied to determine transcription sites recognized by three transcription factors: Sp1, cMyc, and p53 [2]. The second method, DamID, involves fusion between the target protein and DNA adenine methyltransferase (Dam) from Escherichia coli. The 32 kDa Dam tag methylates GATC sites on the DNA close to the binding sites of the protein of interest. Using restriction enzymes specific for methylated or unmethylated GATC, the specific nucleic acid sequences are isolated and then identified by DNA chip analyses [3,4]. Nucleic acid synthesis and amplification are easy tasks achieved by chemical or molecular biological methods. Thus, the high-throughput screening of nucleic acid sequences that bind specifically to a protein target is straightforward as shown by the identification of ciselements of the p53 transcription factor [5]. The identification of specific proteins that bind to a certain nucleic acid is difficult because no method exists to amplify the amino acid sequence and chemical synthesis is difficult for long peptides [6]. Interesting proteins are usually of low

1266

N. He´garat et al. / Biochimie 90 (2008) 1265e1272

abundance. Moreover, proteins are generally post-translationally modified with phosphate, lipids and carbohydrates; these modifications are often essential for proper folding or for function but complicate protein sequencing and production. Bioinformatic approaches allow discovery of putative nucleic acid-binding proteins using accumulated knowledge about various nucleic acid-binding domains. These data permit the prediction of amino acid sequences that interact with nucleic acids and their comparison with sequences present in databases can be carried out to identify candidate proteins [7,8]. These proteins can then be expressed and nucleic acid-binding activities of these proteins can be determined experimentally. This approach is useful; however, it requires the knowledge of nucleic acid-binding domains. Several methods allow detection of the interaction between a protein and a known nucleic acid sequence such as electrophoretic mobility shift assay (EMSA), filter-binding, and DNA footprinting. These techniques have the advantage of detecting proteins that recognize certain nucleic acid sequences. The origin of protein pools used in these assays varies: proteins can be produced after the introduction of the nucleic acid-coding sequence in an expression system or proteins can be extracted from their natural environment. Coupled to other methods, the identity of the proteins bound to the nucleic acid can be determined. This review focuses on strategies to identify nucleic acid-binding proteins assembled in complexes without knowledge of the protein partner. Numerous examples exist therefore we apologize in advance for any work not cited. 2. From protein libraries Numerous methods have been developed to produce multiple proteins in large scale from cells or from cell-free expression systems. These advances allow preparation of protein libraries that can be further screened for interactions with various ligands, including nucleic acids. Open reading frames (ORFs) present a genomic source for protein production. This approach has been widely used to make yeast protein libraries, thanks to the efficiency of homologous recombination in this organism. The nucleic acid sequence coding for a tag, a short peptide or protein domain, is introduced next to the ORF in the yeast genome and tagged protein is synthesized. This fusion protein can be purified from yeast proteins using the characteristics of the tag, then assayed for the desired nucleic acid-binding activity [9]. In other organisms, homologous recombination is less efficient, therefore recombination systems have been developed. For example, a deficient l prophage has been introduced into bacteria where it retains recombination functions [10]; this permits the introduction of a DNA sequence coding for a tag [11]. Another approach, named the Gateway recombination cloning system, is based on integration/excision of phage l in E. coli [12]. This system allows production of fusion proteins from ORFs of yeast [13], Caenorhabditis elegans [14], and humans [15]. Protein libraries can also be produced from mRNA from whole organisms or tissues. Total mRNAs are extracted, retro-transcribed, and introduced in appropriate expression vectors. Proteins are then synthesized in the host cell, extracted and

purified. An alternative method, phage display, results in expression and presentation of the protein on the phage membrane surface. The nucleic acid-binding assay is then carried out with a phage library using the nucleic acid target immobilized on a chromatographic support. The phage expressing the protein of interest is isolated and the DNA coding for the expressed protein is sequenced [16]. Different biological entities such as eukaryotic cells or viruses can be used to present proteins at their surfaces [17]. In some cases, heterologous expressed proteins can be toxic or unstable in the host cell and cell-free systems prove useful in these cases [18,19]. Proteins can be covalently linked to their coding mRNA (ribosome or mRNA display) or DNA (DNA display) [17]. They can also be obtained from a DNA plasmid. This last case allows high-throughput production of different proteins that are immobilized on a chip for nucleic acid-binding assays (Fig. 1A) [20]. Several methods are available for detecting nucleic acide protein interactions in screens of protein libraries. EMSA is a classical approach. Following protein expression and partial purification, the resulting mixture of proteins is incubated with the nucleic acid target and the interaction is analysed on a native gel. The presence of a band shift indicates that a certain protein pool contains the nucleic acid-binding protein and deconvolution of the selected pool leads to the protein identification [9]. Affinity purification using the nucleic sequence as bait is another widely used method, particularly in display technologies. The library of proteins displayed on cell surface or physically linked to their coding nucleic acid sequence is incubated with the ligand of interest, immobilized on a support. After washes the retained proteins are eluted. Several enrichment rounds (incubation, washes, and elution) are needed to recover specific proteins. The coding nucleic acid, present in cells or physically linked to the protein, is then sequenced [16,21]. Rapid and efficient methods have been recently developed to recover the coding nucleic acid [22]. The use of protein microarrays is the most appropriate method for analysis of high-throughput proteomes: each purified protein is separately spotted on a chip surface, then the microarray is incubated with the nucleic acid target. This ligand is usually labelled with a fluorescent molecule, permitting the detection of the nucleic acid-binding activity (Fig. 1B) [23]. This method can be extended to whole-organism proteomes [24]. For instance, 5800 different yeast proteins (80% of the yeast proteome) were deposited onto microarrays and more than 200 DNAbinding proteins were identified that interacted with single and/or double-stranded genomic sequences [25]. Interaction assays can also be directly carried out in host cells such as yeast. DNAeprotein interactions are detected by the yeast one-hybrid method that is based on the activation of reporter gene expression (Fig. 2A). A yeast strain, containing the DNA region of interest as a cis-element target upstream a reporter gene in its genome, is transformed with a plasmid library of the sequences coding for different proteins linked to a transcription activation domain. If the protein interacts with the cis-element target, reporter gene expression is activated. Using this method, transcription factors have been identified in soybean [26] and a DNAeprotein interaction

N. He´garat et al. / Biochimie 90 (2008) 1265e1272

1267

Fig. 1. Protein production and immobilization on a chip. (A) A plasmid coding for a tagged protein is immobilized on an array. The microarray is incubated with rabbit reticulocyte lysate to permit fusion protein expression. Tagged protein is immobilized on chip via its interaction with an antibody directed against the tag. (B) Each ORF is cloned in a vector that is suitable for protein expression in the chosen host cell. Proteins are then purified and spotted on slides. In both cases, protein microarrays are incubated with a nucleic acid, labelled with a fluorochrome, for example. After several washes, the interacting proteins are detected using fluorescence. Using the ORF sequence, the identity of the protein is determined.

network has been investigated in C. elegans [27]. A more sophisticated approach, the yeast three-hybrid system, has been developed to identify RNA-binding proteins (Fig. 2B). In this assay, an RNA molecule that contains constant and variable sequences is used. Proteins that are candidates for binding to the variable region are linked to a transcription activator domain and a protein that recognizes the RNA constant sequence is coupled to a DNA-binding domain. This last domain recognizes the reporter gene promoter and if an RNAeprotein interaction occurs, the reporter gene expression is activated [28]. This method can be applied to mRNA [29] as well as tRNA [30]. RNA sequences associated with a protein of interest were also identified with the three-hybrid system [31]. The screening of protein libraries allows low abundance proteins to be detected when a small amount of biological material is available. Indeed, expression in ectopic systems from their corresponding sequence (DNA or RNA) allows sufficient amounts of protein to be produced for analysis by classical methods. Thus, novel DNA-binding proteins have been

identified using cDNA libraries prepared by isolating mRNAs from oocyte [32]. Generally, the proteins contained in these libraries represent mostly protein domains although entire proteins have been expressed [33]. This may be an advantage as it can lead to the identification of amino acid sequences responsible for nucleic acid-binding activity [16]. This information allows inference of protein activities and may facilitate design of drugs to inhibit this interaction in a pathological context. The downside is the loss or the gain of nucleic acid-binding activities because proteins, or rather protein domains, are expressed in an artificial environment. Indeed, the possible effect of the tag and/or the absence of natural conditions can lead to an inadequate protein folding and the loss of post-translational modifications. The nucleic acid-binding activities can also be altered by the lack of additional protein partners. Therefore, these methods are usually used to study binary interactions between one protein and its cognate nucleic acid sequence. Strategies for identification of novel nucleic acid-binding proteins, expressed in their natural environment, are required to

Fig. 2. Hybrid systems to identify nucleic acid-binding proteins. (A) The one-hybrid system to detect DNAeprotein interactions. (B) The three-hybrid system to detect RNAeprotein interactions. Constant elements are indicated in purple and variable elements in another color: the nucleic acid target is in pink and protein domain candidates are indicated in yellow. (Adapted from Ref. [28] by permission of Oxford University Press.)

1268

N. He´garat et al. / Biochimie 90 (2008) 1265e1272

complement the results obtained from screening of protein libraries. 3. From protein extracts Proteins, extracted from whole organisms or tissues, may be separated by gel and then assayed for their binding to the nucleic sequence of interest. Two-dimensional (2D) gels have been used and the proteins are then transferred onto a membrane and renaturated before being incubated with radiolabelled nucleic acid. This method can be applied to DNA (southwestern blot) to detect proteins that bind specifically to either a sequence or a structure [34], and also to RNA (northwestern blot) [35]. However, these techniques present some limits as incomplete renaturation can disrupt nucleic acid-binding activities and lead to false negatives. Another drawback is due to uncomplete recovering or/and unperfect electrophoretic resolution of basic, high molecular weight or hydrophobic proteins, assigned to technical limitations inherent to 2D-electrophoresis. Proteins that interact

with the labelled nucleic acid are excised and identified by peptide sequencing using the Edman degradation reaction, now an automated method. Recent developments in mass spectrometry have made protein identification possible by peptide-mass fingerprinting and/or by peptide sequencing [36]. This technology is becoming more common in the study of nucleic acideprotein interactions [37]. Proteins and nucleic acids can be incubated in solution and interactions are detected by EMSA. In combination with 2Delectrophoresis and mass spectrometry interacting proteins are identified. Two approaches have been developed. In the first, proteins are resolved on a 2D-gel and, in parallel, on 1D-gels according to their isoelectric point (pI ) or their molecular mass. Proteins are extracted from 1D-gels and their nucleic acid-binding activities are evaluated by EMSA. The pI and the molecular mass of the specific protein are determined by initial position on the 1D-gel. The corresponding spot on the 2D-gel is excised in order to carry out the protein identification (Fig. 3A) [38]. The second method is a simplified version of the first. A gel shift is carried out with or without the

Fig. 3. Identification of proteins analysed by gel shift on a nucleic acid gel (A) or protein gel (B). (A) Proteins are separated according to one dimension, pI or molecular mass, then proteins are eluted and assayed by gel shift. pI and molecular mass of proteins binding the nucleic acid are determined and reported to a 2Dgel. The spots are excised to identify candidate proteins. For simplicity, only one protein is represented in a gel slice. (B) Proteins are separated on native gel in the presence or absence of nucleic acid, then in a second dimension according to their molecular mass. Comparison between two gels allows selection of proteins of altered electrophoretic mobility in the presence of nucleic acid.

N. He´garat et al. / Biochimie 90 (2008) 1265e1272

nucleic acid of interest and the proteins are then separated on a denaturing gel. The comparison between gels e with and without nucleic acid e permits the detection proteins that interact specifically with the nucleic acid since interaction with the nucleic acid led to a modification of electrophoretic mobility (Fig. 3B) [39]. Proteins can be also isolated by purifying specific cellular compartments and/or by chromatographic methods, for example, ion-exchange chromatography or gel filtration. The obtained chromatographic fractions are analysed for nucleic acid-binding activity using a technique such as EMSA to detect which fraction contains the specific protein. Several steps, generally with different chromatographic methods, are needed to purify the specific protein to homogeneity. Most often, the purification scheme includes affinity chromatography using the nucleic acid target as bait. Yaneva and Tempst [40] described an approach for purification of transcription factors. The steps comprise an extraction of nuclear proteins and a phosphocellulose column followed by a series of affinity columns using the nucleic acid bearing either the specific site or a mutated site. An optimized purification scheme must be developed for each protein, thus a generalization of the method is difficult to envisage. Several transcription factors have been purified with this strategy. A one-step affinity purification can provide analytical amounts of specific proteins, thus permitting their identification. Affinity chromatographies using the specific site and a mutated site are carried out and the corresponding eluates are loaded on a denaturing gel. Differential analysis points out the specific protein(s) [41]. The comparison can also be directly carried out during the mass spectrometric analyses [42]. After elution, proteins are modified with a reagent bearing either a light isotope (hydrogen) or a heavy one (deuterium). The proteins from both eluates are then mixed and digested by trypsin. The resulting peptides from different eluates are analysed by mass spectrometry and are distinguished as light and heavy peptides. This approach, named isotope-coded affinity tag (ICAT), permits qualitative and quantitative analyses. Using this method, the Six4 transcription factor, which recognizes a gene sequence from the muscular creatine kinase, was identified among numerous co-purified proteins [43]. Purification of nucleic acid-binding proteins has also been carried out at small scale using chips on which nucleic acid sequences are immobilized. This approach can be coupled to mass spectrometry analysis. Microarrays are incubated with protein extract and after washes, the desorptioneionization of proteins is carried out upon matrix addition. The resulting ions are

1269

analysed by mass spectrometry. This method, SELDI-TOF MS (surface enhanced laser desorption/ionization-time of flight mass spectrometry), identifies protein candidates by their molecular weights. Once identified, antibodies to the candidates can be produced and be used in immunoelectromobility shift assays [44]. In addition, as the purification conditions can be quickly optimized on the chips, affinity chromatographies can be carried out to identify specific proteins [45]. To isolate and then to identify low abundance proteins is difficult when starting with protein extracts from natural organisms; nevertheless, this approach allows purification of proteins in their native states when an appropriate extraction protocol is used. The three-dimensional structures and the post-translational modifications are mostly preserved under these conditions, and these features are often involved in nucleic acid-binding activities. Using mass spectrometric analyses, post-translational modifications can be detected. The relevance of proteins identified by any of the above methods must be validated in cells since protein modifications can occur throughout the protein purification and in vitro assays suppress cell compartmentalization. It would be interesting to identify proteins interacting with nucleic acid directly in cells and some approaches have been developed. In one example, crosslinks were induced by ionizing radiation. Covalently linked DNAeprotein complexes were purified and the involved proteins were identified by mass spectrometry. This work allowed identification of molecular mechanisms affected by irradiation [46]. A similar study was carried out with cisplatin, a drug used in cancer treatment, in order to understand cisplatin-resistance mechanisms [47]. A more general strategy has been developed to identify proteins that bind specifically to an mRNA sequence. A crosslinking agent was conjugated to a peptide nucleic acid (PNA) complementary to the mRNA of interest. After its introduction into cells, the PNA hybridized with the target mRNA and under UV irradiation a crosslink was produced between the agent coupled to the PNA and proteins present on the mRNA. Proteins were then purified by affinity chromatography using the mRNA as bait and identified by mass spectrometry [48]. The interaction between the identified protein and the mRNA is theoretically relevant because the crosslink could only be the result of proximity of the molecular species at a given time. To our knowledge, no general method allows purification of complexes assembled in cells on a known nucleic acid sequence. However, nucleic acid-binding multiprotein complexes have been identified from protein libraries or from protein extracts.

Table 1 Competition-based methods to uncouple nucleic acideprotein complexes from chromatographic support Immobilized molecule Antibiotic Streptavidin Maltose Anti-myc

Intermediate molecule

Fusion protein (maltose-binding protein and MS2 protein) Fusion protein (myc and U1A)

Molecule coupled to nucleic acid

Uncoupling agent

Reference

Aptamer specific for the antibiotic Aptamer specific for streptavidin MS2 pre-mRNA

Free nucleic acid target Free antibiotic Biotin Maltose

[55] [56,57] [58] [49]

U1 snRNA

Myc peptides

[50]

N. He´garat et al. / Biochimie 90 (2008) 1265e1272

1270

Table 2 Cleavage-based methods to uncouple nucleic acideprotein complexes from chromatographic support Cleavable linker

Uncoupling agent

Reference

DNA Restriction site Disulfide bond DNA sequence Photocleavable linker

DNaseI Restriction enzyme Dithiothreitol Temperature Irradiation

[59] [60] [61] [62] [51,52]

4. Identification of multiprotein complexes interacting with nucleic acids The screening of protein libraries has been applied to identify multiprotein complexes bound to the nucleic acid of interest. An mRNA display strategy has been developed in order to purify a transcription factor complex [21]. Using the three-hybrid system, multiprotein complexes interacting with RNA can also be detected; in this case, the experiment can be carried out when at least one partner is known [28]. These methods are difficult to apply when highly complex protein assemblies are being investigated. Therefore, proteins extracted from their natural environment that retain structure and modifications are required and affinity chromatographic methods are essential to identification of members of these complexes. It is difficult to use the affinity chromatography method to efficiently purify multiprotein assemblies. Indeed, the classical elution conditions, involving an increase in salt or/and detergent concentration, lead to a release of proteins specifically bound to the immobilized nucleic acid but also proteins associated with the chromatographic support. Specific proteins are therefore contaminated by proteins bound to the support. Generally, comparison of the results obtained from affinity chromatography with nucleic acids with and without a mutated

site is useful but not sufficient. As the proteins are often low abundance and numerous proteins are co-purified, several chromatographic steps are needed. However, this procedure is not suitable for the purification of multiprotein assemblies because the protein partners can be lost in the first steps. A solution to these problems consists in separation of contaminants associated with the support from proteins in the complex and a single-round purification procedure can be envisaged (Tables 1 and 2). In addition, avoiding salt or detergent increase allows recovery of proteins in their native state. Several uncoupling approaches have been developed. The first strategy is based on a competition between two molecules involved in the coupling of the nucleic acid on the support. The competition consists in adding in the elution buffer one of the molecules in great excess or a similar molecule, with a higher affinity, to allow recovery of the proteinenucleic acid complex in the buffer. Sophisticated competition strategies have been developed, involving intermediate molecules that bind to one side the chromatographic support and to other side the nucleic acideprotein complex [49,50]. Several molecular interactions have been exploited (Table 1). In another approach, a cleavage is induced between the nucleic acide protein complex and the support. The nature of the cleavage agent varies; agents can be enzymatic, chemical, or physical (Table 2). A few groups have introduced a photocleavable linker cleaved by irradiation to release complex from support [51,52]. DNA- and RNA-binding proteins have been purified with this approach as have proteins that bind to specific DNA structures such as double-strand breaks. This method avoids addition of molecules to the elution buffer that can be detrimental for further analyses such as 2D-electrophoresis or direct analyses by mass spectrometry. The different strategies for release of specific proteins from chromatographic support are summarized in Fig. 4. Use of any

Fig. 4. Overview of methods permitting the release of nucleic acid-binding proteins. After washes, specific proteins bound to immobilized nucleic acid (A) can be recovered according to three strategies. The first consists of disrupting the nucleic acideprotein interactions (B). Other methods allow recovery of the nucleic acideprotein complex either by adding molecules in the elution buffer (C) or by applying physical agents (D).

N. He´garat et al. / Biochimie 90 (2008) 1265e1272

affinity purification procedure and one of these uncoupling methods allows proteins to be recovered in their native states. These procedures can be carried out on preparative as well as analytical scales so that functional studies can be carried out.

[6]

5. Conclusions

[8]

The purification of nucleic acid-binding protein assemblies is a challenge, particularly when these regulatory complexes are low abundance compared to highly abundant proteins that bind to nucleic acids non-specifically. However, numerous strategies allow identification of specific nucleic acid-binding proteins. Different information about the nucleic acid-binding protein can be obtained according to the chosen method: the identity of interaction domains, the post-translational modifications, the enzymatic activities, or the identities of protein partners involved in a complex. The techniques we have reviewed are mainly in vitro methods; therefore, the subsequent validation of findings in cells is essential. A few strategies have been proposed to identify nucleic acideprotein interactions in cells but these strategies cannot be generalized. The analysis of multisubunit complexes suppose in most cases, a preconceived idea of one partner and its subsequent modification with a tag, thus affinity purifications of the complexes can be carried out [53,54]. The challenge is to develop general methods to detect and identify unknown multiprotein complexes assembled in vivo on DNA. When this goal is met, nucleic acideprotein interactome maps can be created, complementing proteineprotein interactome maps.

[7]

[9] [10]

[11]

[12]

[13]

[14]

[15]

Acknowledgments This work was funded in part by INSERM, CNRS, MNHN and INTAS 03-51-5281. N.H. benefits from a pre-doctoral fellowship from the Ministe`re de´le´gue´ a` l’enseignement supe´rieur et a` la recherche (France) and from the Ligue nationale contre le cancer (France).

[16]

[17]

[18]

References [19] [1] F. Vigneault, S.L. Gue´rin, Regulation of gene expression: probing DNAe protein interactions in vivo and in vitro, Expert Rev. Proteomics 2 (2005) 705e718. [2] S. Cawley, S. Bekiranov, H.H. Ng, P. Kapranov, E.A. Sekinger, D. Kampa, A. Piccolboni, V. Sementchenko, J. Cheng, A.J. Williams, R. Wheeler, B. Wong, J. Drenkow, M. Yamanaka, S. Patel, S. Brubaker, H. Tammana, G. Helt, K. Struhl, T.R. Gingeras, Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs, Cell 116 (2004) 499e509. [3] B. van Steensel, S. Henikoff, Identification of in vivo DNA targets of chromatin proteins using tethered Dam methyltransferase, Nat. Biotechnol. 18 (2000) 424e428. [4] B. van Steensel, Mapping of genetic and epigenetic regulatory networks using microarrays, Nat. Genet. 37 (Suppl.) (2005) S18eS24. [5] C.L. Wei, Q. Wu, V.B. Vega, K.P. Chiu, P. Ng, T. Zhang, A. Shahab, H.C. Yong, Y. Fu, Z. Weng, J. Liu, X.D. Zhao, J.L. Chew, Y.L. Lee, V.A. Kuznetsov, W.K. Sung, L.D. Miller, B. Lim, E.T. Liu, Q. Yu,

[20]

[21]

[22]

[23] [24] [25]

1271

H.H. Ng, Y. Ruan, A global map of p53 transcription-factor binding sites in the human genome, Cell 124 (2006) 207e219. U.K. Blaschke, J. Silberstein, T.W. Muir, Protein engineering by expressed protein ligation, Methods Enzymol. 328 (2000) 478e496. L.S. Wyrwicz, L. Rychlewski, Identification of Herpes TATT-binding protein, Antiviral Res. 75 (2007) 167e172. K. Fujishima, M. Komasa, S. Kitamura, H. Suzuki, M. Tomita, A. Kanai, Proteome-wide prediction of novel DNA/RNA-binding proteins using amino acid composition and periodicity in the hyperthermophilic archaeon Pyrococcus furiosus, DNA Res. 14 (2007) 91e102. T.R. Hazbun, S. Fields, A genome-wide screen for site-specific DNAbinding proteins, Mol. Cell. Proteomics 1 (2002) 538e543. D. Yu, H.M. Ellis, E.C. Lee, N.A. Jenkins, N.G. Copeland, D.L. Court, An efficient recombination system for chromosome engineering in Escherichia coli, Proc. Natl. Acad. Sci. U. S. A. 97 (2000) 5978e5983. G. Butland, J.M. Peregrı´n-Alvarez, J. Li, W. Yang, X. Yang, V. Canadien, A. Starostine, D. Richards, B. Beattie, N. Krogan, M. Davey, J. Parkinson, J. Greenblatt, A. Emili, Interaction network containing conserved and essential protein complexes in Escherichia coli, Nature 433 (2005) 531e537. A.J. Walhout, G.F. Temple, M.A. Brasch, J.L. Hartley, M.A. Lorson, S. van den Heuvel, M. Vidal, GATEWAY recombinational cloning: application to the cloning of large numbers of open reading frames or ORFeomes, Methods Enzymol. 328 (2000) 575e592. D.M. Gelperin, M.A. White, M.L. Wilkinson, Y. Kon, L.A. Kung, K.J. Wise, N. Lopez-Hoyo, L. Jiang, S. Piccirillo, H. Yu, M. Gerstein, M.E. Dumont, E.M. Phizicky, M. Snyder, E.J. Grayhack, Biochemical and genetic analysis of the yeast proteome with a movable ORF collection, Genes Dev. 19 (2005) 2816e2826. P. Lamesch, S. Milstein, T. Hao, J. Rosenberg, N. Li, R. Sequerra, S. Bosak, L. Doucette-Stamm, J. Vandenhaute, D.E. Hill, M. Vidal, C. elegans ORFeome version 3.1: increasing the coverage of ORFeome resources with improved gene predictions, Genome Res. 14 (2004) 2064e2069. P. Lamesch, N. Li, S. Milstein, C. Fan, T. Hao, G. Szabo, Z. Hu, K. Venkatesan, G. Bethel, P. Martin, J. Rogers, S. Lawlor, S. McLaren, A. Dricot, H. Borick, M.E. Cusick, J. Vandenhaute, I. Dunham, D.E. Hill, M. Vidal, hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes, Genomics 89 (2007) 307e315. C. Cicchini, H. Ansuini, L. Amicone, T. Alonzi, A. Nicosia, R. Cortese, M. Tripodi, A. Luzzago, Searching for DNAeprotein interactions by lambda phage display, J. Mol. Biol. 322 (2002) 697e706. A. Sergeeva, M.G. Kolonin, J.J. Molldrem, R. Pasqualini, W. Arap, Display technologies: application for the discovery of drug and gene delivery agents, Adv. Drug Deliv. Rev. 58 (2006) 1622e1654. T. Sawasaki, T. Ogasawara, R. Morishita, Y. Endo, A cell-free protein synthesis system for high-throughput proteomics, Proc. Natl. Acad. Sci. U. S. A. 99 (2002) 14652e14657. M. He, M.J. Taussig, Eukaryotic ribosome display with in situ DNA recovery, Nat. Methods 4 (2007) 281e288. N. Ramachandran, E. Hainsworth, B. Bhullar, S. Eisenstein, B. Rosen, A.Y. Lau, J.C. Walter, J. LaBaer, Self-assembling protein microarrays, Science 305 (2004) 86e90. S. Tateyama, K. Horisawa, H. Takashima, E. Miyamoto-Sato, N. Doi, H. Yanagawa, Affinity selection of DNA-binding protein complexes using mRNA display, Nucleic Acids Res. 34 (2006) e27. N. Doi, H. Takashima, A. Wada, Y. Oishi, T. Nagano, H. Yanagawa, Photocleavable linkage between genotype and phenotype for rapid and efficient recovery of nucleic acids encoding affinity-selected proteins, J. Biotechnol. 131 (2007) 231e239. D.A. Hall, J. Ptacek, M. Snyder, Protein microarray technology, Mech. Ageing Dev. 128 (2007) 161e167. L.A. Kung, M. Snyder, Proteome chips for whole-organism assays, Nat. Rev. Mol. Cell Biol. 7 (2006) 617e622. D.A. Hall, H. Zhu, X. Zhu, T. Royce, M. Gerstein, M. Snyder, Regulation of gene expression by a metabolic enzyme, Science 306 (2004) 482e484.

1272

N. He´garat et al. / Biochimie 90 (2008) 1265e1272

[26] H.C. Park, M.L. Kim, S.M. Lee, J.D. Bahk, D.J. Yun, C.O. Lim, J.C. Hong, S.Y. Lee, M.J. Cho, W.S. Chung, Pathogen-induced binding of the soybean zinc finger homeodomain proteins GmZF-HD1 and GmZF-HD2 to two repeats of ATTA homeodomain binding site in the calmodulin isoform 4 (GmCaM4) promoter, Nucleic Acids Res. 35 (2007) 3612e3623. [27] B. Deplancke, A. Mukhopadhyay, W. Ao, A.M. Elewa, C.A. Grove, N.J. Martinez, R. Sequerra, L. Doucette-Stamm, J.S. Reece-Hoyes, I.A. Hope, H.A. Tissenbaum, S.E. Mango, A.J.M. Walhout, A gene-centered C. elegans proteineDNA interaction network, Cell 125 (2006) 1193e1205. [28] D.S. Bernstein, N. Buter, C. Stumpf, M. Wickens, Analyzing mRNAe protein complexes using a yeast three-hybrid system, Methods 26 (2002) 123e141. [29] E. Sarnowska, E.A. Grzybowska, K. Sobczak, R. Konopinski, A. Wilczynska, M. Szwarc, T.J. Sarnowski, W.J. Krzyzosiak, J.A. Siedlecki, Hairpin structure within the 30 UTR of DNA polymerase beta mRNA acts as a post-transcriptional regulatory element and interacts with Hax-1, Nucleic Acids Res. 35 (2007) 5499e5510. [30] M. Steiner-Mosonyi, D.M. Leslie, H. Dehghani, J.D. Aitchison, D. Mangroo, Utp8p is an essential intranuclear component of the nuclear tRNA export machinery of Saccharomyces cerevisiae, J. Biol. Chem. 278 (2003) 32236e32245. [31] K.J.L. Riley, L.A. Cassiday, A. Kumar, L.J. Maher, Recognition of RNA by the p53 tumor suppressor protein in the yeast three-hybrid system, RNA 12 (2006) 620e630. [32] M. Ganesan, K.R. Paithankar, M.V. Jagannadham, C.S. Sundaram, B.S. Murthy, L. Singh, Characterization of novel DNA-binding proteins expressed in snake oocyte cDNA library, Protein Expr. Purif. 53 (2007) 164e178. [33] J. McCafferty, R.H. Jackson, D.J. Chiswell, Phage-enzymes: expression and affinity chromatography of functional alkaline phosphatase on the surface of bacteriophage, Protein Eng. 4 (1991) 955e961. [34] F. Guillonneau, A.L. Guieysse, J.P.L. Caer, J. Rossier, D. Praseuth, Selection and identification of proteins bound to DNA triple-helical structures by combination of 2D-electrophoresis and MALDI-TOF mass spectrometry, Nucleic Acids Res. 29 (2001) 2427e2436. [35] A.K. Nair, K.M.J. Menon, Isolation and characterization of a novel transfactor for luteinizing hormone receptor mRNA from ovary, J. Biol. Chem. 279 (2004) 14937e14944. [36] R. Aebersold, M. Mann, Mass spectrometry-based proteomics, Nature 422 (2003) 198e207. [37] F. Rusconi, F. Guillonneau, D. Praseuth, Contributions of mass spectrometry in the study of nucleic acid-binding proteins and of nucleic acide protein interactions, Mass Spectrom. Rev. 21 (2002) 305e348. [38] A.J. Woo, J.S. Dods, E. Susanto, D. Ulgiati, L.J. Abraham, A proteomics approach for the identification of DNA binding activities observed in the electrophoretic mobility shift assay, Mol. Cell. Proteomics 1 (2002) 472e478. [39] J.A. Stead, J.N. Keen, K.J. McDowall, The identification of nucleic acidinteracting proteins using a simple proteomics-based approach that directly incorporates the electrophoretic mobility shift assay, Mol. Cell. Proteomics 5 (2006) 1697e1702. [40] M. Yaneva, P. Tempst, Affinity capture of specific DNA-binding proteins for mass spectrometric identification, Anal. Chem. 75 (2003) 6437e 6448. [41] E. Nordhoff, A.M. Krogsdam, H.F. Jorgensen, B.H. Kallipolitis, B.F. Clark, P. Roepstorff, K. Kristiansen, Rapid identification of DNAbinding proteins by mass spectrometry, Nat. Biotechnol. 17 (1999) 884e888. [42] S.P. Gygi, B. Rist, S.A. Gerber, F. Turecek, M.H. Gelb, R. Aebersold, Quantitative analysis of complex protein mixtures using isotope-coded affinity tags, Nat. Biotechnol. 17 (1999) 994e999.

[43] C.L. Himeda, J.A. Ranish, J.C. Angello, P. Maire, R. Aebersold, S.D. Hauschka, Quantitative proteomic identification of six4 as the trex-binding factor in the muscle creatine kinase enhancer, Mol. Cell. Biol. 24 (2004) 2132e2143. [44] M.P. Font, M. Cubizolles, H. Dombret, L. Cazes, V. Brenac, F. Sigaux, M. Buckle, Repression of transcription at the human T-cell receptor Vbeta2.2 segment is mediated by a MAX/MAD/mSin3 complex acting as a scaffold for HDAC activity, Biochem. Biophys. Res. Commun. 325 (2004) 1021e1029. [45] C.E. Forde, A.D. Gonzales, J.M. Smessaert, G.A. Murphy, S.J. Shields, J.P. Fitch, S.L. McCutchen-Maloney, A rapid method to capture and screen for transcription factors by SELDI mass spectrometry, Biochem. Biophys. Res. Commun. 290 (2002) 1328e1335. [46] S. Barker, M. Weinfeld, J. Zheng, L. Li, D. Murray, Identification of mammalian proteins cross-linked to DNA by ionizing radiation, J. Biol. Chem. 280 (2005) 33826e33838. [47] S.K. Samuel, V.A. Spencer, L. Bajno, J.M. Sun, L.T. Holth, S. Oesterreich, J.R. Davie, In situ cross-linking by cisplatin of nuclear matrix-bound transcription factors to nuclear DNA of human breast cancer cells, Cancer Res. 58 (1998) 3004e3008. [48] J. Zielinski, K. Kilk, T. Peritz, T. Kannanayakal, K.Y. Miyashiro, E. Eirı´ksdo´ttir, J. Jochems, U. Langel, J. Eberwine, In vivo identification of ribonucleoproteineRNA interactions, Proc. Natl. Acad. Sci. U. S. A. 103 (2006) 1557e1562. [49] R. Das, Z. Zhou, R. Reed, Functional association of U2 snRNP with the ATP-independent spliceosomal complex E, Mol. Cell 5 (2000) 779e787. [50] D. Piekna-Przybylska, B. Liu, M.J. Fournier, The U1 snRNA hairpin II as a RNA affinity tag for selecting snoRNP complexes, Methods Enzymol. 425 (2007) 317e353. [51] J. Martinez, A. Patkaniowska, H. Urlaub, R. Lu¨hrmann, T. Tuschl, Single-stranded antisense siRNAs guide target RNA cleavage in RNAi, Cell 110 (2002) 563e574. [52] N. He´garat, G.M. Cardoso, F. Rusconi, J.C. Francois, D. Praseuth, Analytical biochemistry of DNAeprotein assemblies from crude cell extracts, Nucleic Acids Res. 35 (2007) e92. [53] G. Rigaut, A. Shevchenko, B. Rutz, M. Wilm, M. Mann, B. Se´raphin, A generic protein purification method for protein complex characterization and proteome exploration, Nat. Biotechnol. 17 (1999) 1030e1032. [54] C. Guerrero, C. Tagwerker, P. Kaiser, L. Huang, An integrated mass spectrometry-based proteomic approach: quantitative analysis of tandem affinity-purified in vivo cross-linked protein complexes (QTAX) to decipher the 26 S proteasome-interacting network, Mol. Cell. Proteomics 5 (2006) 366e378. [55] H. Tian, Combinatorial selection of RNA ligands for complex cellular targets: the RNA ligands-based proteomics, Mol. Cell. Proteomics 1 (2002) 99e103. [56] M. Bachler, R. Schroeder, U. von Ahsen, StreptoTag: a novel method for the isolation of RNA-binding proteins, RNA 5 (1999) 1509e1516. [57] K. Hartmuth, H.P. Vornlocher, R. Lu¨hrmann, Tobramycin affinity tag purification of spliceosomes, Methods Mol. Biol. 257 (2004) 47e64. [58] C. Srisawat, D.R. Engelke, RNA affinity tags for purification of RNAs and ribonucleoprotein complexes, Methods 26 (2002) 156e161. [59] R.A. Rieger, E.I. Zaika, W. Xie, F. Johnson, A.P. Grollman, C.R. Iden, D.O. Zharkov, Proteomic approach to identification of proteins reactive for abasic sites in DNA, Mol. Cell. Proteomics 5 (2006) 858e867. [60] J.A. Ranish, E.C. Yi, D.M. Leslie, S.O. Purvine, D.R. Goodlett, J. Eng, R. Aebersold, The study of macromolecular complexes by quantitative proteomics, Nat. Genet. 33 (2003) 349e355. [61] S.W. Ruby, J. Abelson, An early hierarchic role of U1 small nuclear ribonucleoprotein in spliceosome assembly, Science 242 (1988) 1028e 1035. [62] H. Gadgil, H.W. Jarrett, Oligonucleotide trapping method for purification of transcription factors, J. Chromatogr. A 966 (2002) 99e110.