119
Novel approaches to map protein interactions Daniel Figeys
Although we now have the sequence of the human genome at hand, we face the challenge of assigning function to the identi®ed genes. Genes usually ascribe their function through proteins, and the role of proteins is to interact with other molecules. Therefore, if we could map the interactions of proteins we would be able to understand protein function. The challenge of mapping protein interactions is vast and many novel approaches have recently been developed for this task using molecular biology, mass spectrometry and chemiproteomic techniques.
The challenge of mapping protein±protein interactions in humans is signi®cant. Assuming that there are 30 000 genes, a one gene±one protein relationship and ®ve interactions on average per protein, we can estimate that a minimum of 150 000 interactions are present in humans. In reality, the number of possible proteins derived from a gene is greater than one, in particular when protein modi®cations, splicing and point-mutations are considered. This could easily push the number of interactions into the millions.
Addresses MDS-Proteomics, 251 Atwell Drive, Toronto, Ontario, Canada e-mail: d®
[email protected]
In the past year, novel approaches have been described for mapping protein interactions. The focus of this paper is to describe these novel technologies.
Current Opinion in Biotechnology 2003, 14:119±125 This review comes from a themed issue on Analytical biotechnology Edited by Norm Dovichi and Dan Pinkel 0958-1669/03/$ ± see front matter ß 2003 Elsevier Science Ltd. All rights reserved. DOI 10.1016/S0958-1669(02)00005-8 Abbreviations FKBP12 FK 506 binding protein 12 FRB FKRBP-rapamycin binding domain of FKBP-rapamycin-associated protein JNK Janus kinase LR leptin receptor MAPPIT mammalian protein±protein interactions trap MS mass spectrometry STAT signal transducer and activator of transcription
Introduction
Although the Human Genome Project provided the sequence for roughly 30 000 genes, it did not provide information on their function. In pharmaceutical research, the lack of information on the function of these genes often deters their selection as potential targets. Proteins are generally the effector molecules that ascribe roles to genes. The function of a protein is to interact with other molecules; therefore, if one could instantly map these interacting molecules in space, time and milieu, then one would be able to understand the function of proteins. It is currently not possible to obtain such a level of resolution. Techniques have been developed, however, for the largescale screening of protein interactions. It has also been demonstrated that the information derived from largescale screens of protein interactions can be used to ascribe functions to proteins [1]. www.current-opinion.com
Protein±protein interactions
Protein±protein interactions are key to the formation of complexes and signal transduction through pathways. Furthermore, regulatory mechanisms, often achieved through protein±protein interactions, are in place to control the interactions of proteins. Regulation is achieved by the presence of post-translational modi®cations at speci®c amino acids on the protein. Over the years, high-throughput methods have been designed to discover protein± protein interactions, and in this section we will review these high-throughput methods. Molecular biology approaches
Methods have been developed for the discovery of protein±protein interactions on the basis of molecular and cellular biology. The best known method is the yeast two-hybrid system, which provides a means to rapidly screen for protein±protein interactions [2,3]. The twohybrid approach answers the question: is protein A binding to protein B? Molecular biology is used to express the ®rst protein of interest, called the bait, attached to the DNA-binding domain of a transcription factor lacking the transcription activation domain. At the same time, a second protein, often called the prey, is expressed in yeast attached only to the transcription activation domain. Therefore, it is only when both modi®ed proteins are simultaneously expressed and interact that the reporter gene is turned on. This method has been described in detail in other publications [2,4,5] and will not be reviewed in this paper. Other methods have been developed that report protein± protein interactions by the direct or indirect activation of enzymes. For example, Wehrman et al. [6] described a method to monitor protein±protein interactions in mammalian cells using the b-lactamase enzyme (Figure 1). In this approach, they utilized the complementarity of the Current Opinion in Biotechnology 2003, 14:119±125
120 Analytical biotechnology
Figure 1 Bait Linker ω198
α197 Linker Prey
Generate bait–prey constructs β-Lactamase Assay for β-lactamase activity Current Opinion in Biotechnology
Mapping mammalian protein interactions using a b-lactamase assay. In this approach, a construct is made to create a fusion protein of a protein-bait and the o198 fragment of b-lactamase. A second construct is also designed to create a fusion protein between a protein-prey and the a197 fragment of b-lactamase. Stable cell lines that express both fusion proteins are generated. If the bait protein binds to the prey proteins b-lactamase activity is observed.
a and o fragments of b-lactamase to build a protein interaction reporting assay. Fusion proteins can be constructed with the a and o fragments, and the interaction can be tested by co-transfection in mammalian cells. If the two fusion proteins interact, the a and o fragments are brought into close proximity and the activity of b-lactamase is recovered, allowing growth in ampicillin-containing media. Furthermore, the b-lactamase enzyme activity was increased by screening random tripeptides inserted at the breakpoint termini of the a197 and o198 fragments. It was found that the tripeptide Asn-Gly-Arg, once inserted at the C terminus of a197, enhances the complementarity and the activity of the b-lactamase enzyme. The application of this approach in mammalian cells was illustrated using FKBP12 (FK 506 binding protein 12), which only binds the protein FRB (FKRBP-rapamycin binding domain of FKBP-rapamycin-associated protein) in the presence of rapamycin. Constructs of a fusion protein of FKBP12 with the o198 fragment and FRB with the enhanced a197 fragment were made. A stable mammalian cell line expressing both fusion proteins was generated. The treatment of this cell line with rapamycin generated strong b-lactamase enzymatic activity. Thus, the interaction between FKBP12 and FRB occurred bringing a197 and o198 fragments together to form active b-lactamase. This clearly illustrates the potential of this approach for a protein interaction screen in mammalian cells. For this method to become readily applicable, it will have to be shown to work with transiently transfected cells instead of a stable cell line. Also, this method will have to be applied to a broader range of proteins with a wide range of Kd. Another approach, called mammalian protein±protein interactions trap (MAPPIT), was recently described by Current Opinion in Biotechnology 2003, 14:119±125
Eyckerman et al. [7] (Figure 2). The authors took advantage of the mutation of critical tyrosine residues on the cytosolic domains of a receptor that leads to the loss of STAT (signal transducer and activator of transcription) activation, but not to loss of Janus kinase (JAK) protein activation (i.e. they still interact with the receptor). It was speculated that the recruitment of any other proteins that contain a STAT-binding site would reactivate the signaling. In their study, the authors used the long isoform of the murine leptin receptor (LR), which contains three conserved tyrosine residues on its cytosolic domain. One of the tyrosine residues is a STAT3 recruitment site, whereas the other two are involved in negative control. The mutation of these three sites from tyrosine to phenylalanine renders the recruitment of STAT3 and negative feedback inoperative. Once activated this mutant receptor will recruit JAK, but will fail to recruit STAT. Fortunately, STAT3 also binds to other proteins such as the glycoprotein gp130, a protein involved in interleukin-6 signaling. The C-terminal fragment of gp130 contains four functional STAT3 recruitment sites and lacks negative feedback motifs. Therefore, gp130 can be used to recruit STAT3. The only link missing is the recruitment of gp130 to the receptor. This can be achieved by creating chimeric proteins that contain the sequence of two interacting partners (bait±prey). Thus, a bait protein of interest can be fused to the C terminus of the mutated LR protein, while a prey protein can be fused to the C-terminal fragment of gp130. If the bait and the prey interact then STAT3 is recruited in the proximity of the receptor. The activation of the receptor by its ligand will then promote the recruitment of JAK. JAK phosphorylates STAT3, which then migrates to the nucleus and activates either a luciferase or puromycin resistance reporter system. This means that when an interaction occurs between the bait and prey proteins luciferase is activated or the cell can grow in puromycin-containing medium, depending on the reporter system used. The potential of this system was illustrated using the bait±prey pairs P53±SVT (p53±Simian virus 40 large T antigen) and EpoR±CIS (erythropoietin receptor±cytokine-inducible SH protein) [8]. The MAPPIT approach clearly differentiated these interactions. This approach could potentially be used as a high-throughput interaction screen. Approaches based on mass spectrometry
The coupling of molecular biology, cellular biology and mass spectrometry (MS) allows scientists to speci®cally interrogate cells. By combining these approaches, it is now possible to address the question: what are the proteins that interact in a speci®c cell line or tissue with protein A? Brie¯y, this approach consists of four steps. The ®rst step is to use molecular biology to construct a vector that encodes a protein (the bait) of interest to which a tag (e.g. an epitope tag such as FLAG [9]) is added to the N- or C-terminal end. The FLAG epitope consists of the amino acid sequence DYDDDK (in singlewww.current-opinion.com
Novel approaches to map protein interactions Figeys 121
Figure 2
(a)
(b)
(c) Ligand
Ligand
JAK
JAK
JAK
JAK
Y-p
p-Y
Y-p
p-Y
Y-p Y-p
p-Y p-Y Bait
Bait
Bait
Prey
Bait
(e)
Bait
Bait
Prey
(d) Ligand
p-Y Y-p
STAT3
STAT3
Induction of luciferase activity or puromycin resistance
JAK
JAK STAT3
Y STAT3 Y-p
p-Y p-Y p-Y
Y-p Y-p Prey
Bait
Bait
Prey Current Opinion in Biotechnology
Mapping mammalian protein interactions using a mutant LR receptor. (a) A construct is made to express a fusion protein that contains a bait protein (green) attached to the LR receptor (blue). The LR receptor is mutated at the STAT3-binding sites and the negative regulation sites (Tyr ! Phe). (b) When activated by a ligand, the LR receptor recruits JAK protein (orange). (c) A second construct expresses a fusion protein between gp130 (red; a STAT3-binding protein) and a prey protein (yellow). If an interaction occurs between the bait and the prey protein then gp130 and the LR receptor are brought into close proximity. (d) gp130 recruits STAT3 (grey), which is then phosphorylated by JAK. Phosphorylated tyrosine residues are represented by p-Y. (e) The STAT3 homodimer migrates to the nucleus and activates a reporter gene.
letter amino acid code) added to one of the terminal ends of the protein. Speci®c antibodies are available for that epitope; therefore, proteins that have the short FLAG epitope can be readily immunopuri®ed. The second step is to transfect cells with the vector of interest and let normal localization and interaction occur. The third step is to immunopurify the bait protein and its interacting partners. Finally, the fourth step is to use MS to unambiguously identify the proteins. Remarkably, this method does not require any prior knowledge of the interacting proteins and can routinely identify novel protein interactions [10,11]. One advantage of this approach is that the interactions are formed in a relevant cell line (i.e. human proteins are www.current-opinion.com
expressed in human cells). A second advantage is that the tag, such as the FLAG-tag, is often small and has limited interference with the protein function or its localization. One disadvantage of this approach is that a clone must be available for the protein of interest. For small genomes, homologous recombination approaches can be used to insert a tag directly to the gene in the genome. Fortunately, large collections of human clones are now available and it is to be expected that the majority of human gene clones will be available in the coming years. Recently, examples of the high-throughput mapping of protein complexes by MS have been reported [12,13]. In an effort led by Gavin [12], protein±protein interactions were mapped in yeast using MS as a read out. Current Opinion in Biotechnology 2003, 14:119±125
122 Analytical biotechnology
Molecular biology was ®rst used to directly insert a genespeci®c cassette containing a tag (TAP [tandem af®nity puri®cation] tag) at the 30 end of the genes. Some 1739 genes from yeast were selected and tagged. Of this group, 1167 genes were expressed in yeast. The gene-expressing yeast cells were grown to log phase and the bait proteins and their interactors puri®ed by af®nity puri®cation using the dual af®nity puri®cation approach [14]. Each immunopuri®cation was separated by one-dimensional gel electrophoresis and the protein bands were analyzed using matrix-assisted laser desorption/ionization mass spectrometry (MALDI MS). A total of 1440 yeast proteins were found to interact with the 459 successful bait proteins, representing 25% of the yeast genome. This approach provided an illustration of the high-throughput identi®cation of protein complexes in yeast. One disadvantage of the approach was its poor success rate (only
26% of the genes targeted succeeded in providing information). This might be acceptable in a random clone selection approach, but would be problematic for a directed approach focused on a group of proteins. Another method for the high-throughput mapping of complexes by MS was simultaneously reported in the same issue of Nature by Ho et al. [13]. Although the experiments were performed in yeast, all of the methodology was developed to be readily applicable for the mapping of human protein complexes. First, recombinant-based cloning was used to add a FLAG epitope tag to 725 yeast genes, which were then transfected into yeast. (Recombinant-based cloning is a method that can be easily applied in humans, especially as human gene clone collections are available.) The baits and their interactors were then immunopuri®ed and separated by
Figure 3
Segment of the network of protein interactions in yeast discovered using a high-throughput MS-based proteomic approach. The blue lines represent interactions that were already known to exist in the literature. These clones were transiently expressed in yeast and their interactors unambiguously identified by MS. (Reproduced from [12] with permission.) Current Opinion in Biotechnology 2003, 14:119±125
www.current-opinion.com
Novel approaches to map protein interactions Figeys 123
Figure 4
(a) Nucleus Lsm8p
Cytoplasm
Smb1p
pre-mRNA splicing factors
Prp24p
Lsm2p Lsm4p
Lsm7p mRNA decapping activators Pat1p
Lsm1p
Dhh1p Dcp2p Noncatalytic subunit of mRNA decapping enzyme Known interaction observed by MS Iterative proteome walk Novel interaction identified by MS Arrowhead points to prey; double-headed arrow indicates reciprocal immunoprecipitation
(b)
Cytoplasm
Nucleus Smb1p 8
Prp24p 2
4
pre-mRNA splicing factors
7
3 6
5 Dhh1p
1 2
Pat1p 4 7
3 mRNA decapping activators
6
5
Dcp1p Dcp2p mRNA decapping enzyme
Current Opinion in Biotechnology
Example of iterative interaction mapping by MS to highlight complexes. The figure illustrates an iterative walk involving Lsm protein complexes in yeast. There are nine Lsm proteins in yeast, which are known to associate with each other through the Sm-like motif. (a) The Lsm proteins Lsm1p through to Lsm7p are involved in mRNA decay in the cytoplasm (blue ellipse) and Lsm2p though to Lsm8p are involved in pre-mRNA splicing in the nucleus (green ellipse). A four-step iterative walk using Lsm8p, Lsm2p, Lsm4p and Pat1p as bait proteins revealed their interactors. (These results were taken from [12].) It is notable that one set of experiments identified most of the known interactors in these complexes. (b) Diagram of the known members (from the literature) of the pre-mRNA splicing factors and the mRNA-decapping factors and some of their known interactors.
www.current-opinion.com
Current Opinion in Biotechnology 2003, 14:119±125
124 Analytical biotechnology
gel electrophoresis. The separated proteins were proteolytically digested and the resulting peptides analyzed using MS. Electrospray ionization tandem mass spectrometry (ESI-MS/MS) was utilized to generate fragmentation patterns related to the amino acid sequence of the peptides. These patterns were searched against the yeast genome sequence database and provided unambiguous identi®cation of the proteins. Tandem mass spectrometry can also provide unambiguous identi®cation of proteins in humans. Interactions were discovered for about 70% of the yeast clones for a total of 1578 different interacting proteins, representing 25% of the yeast genome. Figure 3 illustrates the network of protein interactions that were discovered in this study. It is important to realize that proteins are involved in multiple complexes during their life cycle. The immunopuri®cation/MS-based approaches provide a still picture of all these complexes. Fortunately, the different complexes can be dissociated by turning interactors into bait proteins. This is clearly illustrated in Figure 4 in which prey proteins that were found with a yeast bait protein were turned into baits. In fact, up to four bait± prey rounds are apparent in this dataset. Furthermore, this example demonstrates that proteins are often involved in different complexes.
Protein±small-molecule interactions: chemiproteomics
Chemiproteomics aims to use small molecules as af®nity material for the discovery of small-molecule binding proteins [15±17]. This approach could be used to ®nd drug targets. The targets of many drugs are unclear; ®nding their targets would be a ®rst step in directed drug optimization. Also, this approach could be used to study the toxic effects of a drug and its metabolites through studies of their protein interactors. Furthermore, this method could be used to discover new targets for drugs that are already on the market, expanding their labels. Chemiproteomics uses small molecules as baits to ®sh for interacting proteins. In a typical experiment, the drug of interest is tethered at positions that are not expected to interfere with the drug±protein interaction. The proper tethering of the drug is the key to the success of the experiment and requires experts in organic and medicinal chemistry. Once different versions of the tethered drug are available, they are exposed to cell lysates to capture interacting proteins. After appropriate rinsing, the binding proteins are eluted and analyzed by MS. The end result is a list of proteins that were observed to interact with the tethered small molecules. These proteins can then be validated through classical validation approaches. The af®nity puri®cation of proteins based on immobilized small molecules has been tried in the past with mixed success. This was in large part due to the lack of identiCurrent Opinion in Biotechnology 2003, 14:119±125
®cation of the binding proteins limiting optimization. Fortunately, the addition of advanced MS techniques to drug-based af®nity puri®cation can provide the rapid elucidation of drug±protein interactions. The limiting step in the approach will remain the proper design of the tethered versions of the drug. Directed solid-phase combinatorial chemistry could potentially provide high volumes of compounds that are compatible with this approach [18].
Carbohydrate±protein interactions
In their normal function many enzymes and other proteins interact with carbohydrates. These proteins have a critical role in microbe and parasite host interactions, they are important players in the process of immunity, and are involved in protein traf®cking and secretion. Recently, Fukui et al. [19] introduced oligosaccharide microarrays that can be used for the detection of speci®c protein± carbohydrate interactions. Brie¯y, these arrays are fabricated on nitrocellulose and PVDF (polyvinylidene ¯uoride) membranes using a neoglycolipid technology [20] that allows the generation of lipid-linked oligosaccharide based on glycoproteins and polysaccharides. This approach was used to construct arrays of oligosaccharides derived from natural sources and chemical synthesis. For example, a small array of carbohydrates derived from the brain was established and successfully probed using carbohydrate differentiation antigens. Although these experiments are at the proof-of-principle level, they are opening the door to new opportunities. For example, the combination of these arrays with tissue probing and MS identi®cation of proteins could prove to be a powerful approach for the discovery of novel carbohydrate-binding proteins.
Conclusions
The mapping of protein interactions in humans will be the key to a better understanding of protein functions and diseases. Novel tools are available to accelerate the deciphering of protein interactions. It is clear that the data derived from the mapping of protein interactions can also lead to the discovery of novel elements in pathways and potential new drug targets. Biotechnology companies that are involved in mapping protein interactions will probably remain focused on areas of early pharmaceutical value. It is very likely that the mapping of the human `interactome' will have to be driven from academic centers.
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as: of special interest of outstanding interest 1.
Schwikowski B, Uetz P, Fields S: A network of protein±protein interactions in yeast. Nat Biotechnol 2000, 18:1257-1261.
2.
Chien C-T, Bartel PL, Sternglanz R, Fields S: The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. Proc Natl Acad Sci USA 1991, 88:9578-9582. www.current-opinion.com
Novel approaches to map protein interactions Figeys 125
3.
Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, Nishizawa M, Yamamoto K, Kuhara S, Sakaki Y: Toward a protein±protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci USA 2000, 97:1143-1147.
4.
Uetz P: Two-hybrid arrays. Curr Opin Chem Biol 2002, 6:57-62.
5.
Ito T, Chiba T, Yoshida M: Exploring the protein interactome using comprehensive two-hybrid projects. Trends Biotechnol 2001, 19:S23-S27.
6.
Wehrman T, Kleaveland B, Her J-H, Bslint RF, Blau HM: Protein±protein interactions monitored in mammalian cells via complementation of b-lactamase enzyme fragments. Proc Natl Acad Sci USA 2002, 99:3469-3474. An interesting paper showing a hybrid screen based on the complementarity of subunits of b-lactamase. 7.
Eyckerman S, Verhee A, van der Heyden J, Lemmens I, van Ostade X, Vandekerckhove J, Tavernier J: Design and application of a cytokine-receptor-based interaction trap. Nat Cell Biol 2001, 3:1114-1119. An exciting paper showing a hybrid screen in which the bait±prey interaction brings the right proteins to a mutated protein receptor for the reactivation of STAT3 signaling. 8.
9.
Li B, Field S: Identi®cation of mutations in p53 that affect its binding to SV40 large T antigen by using the yeast two-hybrid system. FASEB J 1993, 7:957-963. Einhauer A, Jungbauer A: The FLAG peptide, a versatile fusion tag for the puri®cation of recombinant proteins. J Biochem Biophys Methods 2001, 49:455-465.
10. Figeys D: Functional proteomics: mapping protein±protein interactions and pathways. Curr Opin Mol Ther 2002, 4:210-215. 11. Figeys D, McBroom LD, Moran MF: Mass spectrometry for the study of protein±protein interactions. Methods 2001, 24:230-239. 12. Gavin A-C, BoÈsche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon A-M, Cruciat C-M et al.: Functional
www.current-opinion.com
organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415:141-147. Demonstration using yeast proteins to show that MS can be used for the high-throughput mapping of protein±protein interactions. 13. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams S-L, Millar A, Taylor P, Bennett K, Boutilier K et al.: Systematic identi®cation of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415:180-183. This paper also demonstrates that MS can be used for the mapping of protein±protein interactions. Furthermore, the technology developed in this study can be applied in humans. 14. Rigaut G, Shevchenko A, Rutz B, Wilm M, Mann M, Seraphin B: A generic protein puri®cation method for protein complex characterization and proteome exploration. Nat Biotechnol 1999, 17:1030-1032. 15. Henion J, Li Y, Hsieh Y, Ganem B: Mass spectrometric investigations of drug±receptor interactions. Ther Drug Monit 1993, 15:563-569. 16. Sin N, Meng L, Auth H, Crews C: Eponemycin analogues: syntheses and use as probes of angiogenesis. Bioorg Med Chem 1998, 6:1209-1217. 17. Figeys D: Proteomics approaches in drug discovery. Anal Chem 2002, 74:412A-419A. 18. Lam KS, Renil M: From combinatorial chemistry to chemical microarray. Curr Opin Chem Biol 2002, 6:353-358. 19. Fukui S, Feizi T, Galustian C, Lawson AM, Chai W: Oligosaccharide microarrays for high-throughput detection and speci®city assignments of carbohydrate±protein interactions. Nat Biotechnol 2002, 20:1011-1017. This paper demonstrates the design of oligosaccharide microarrays for the study of protein±carbohydrate interactions. The technology is at an early stage. 20. Tang PW, Gooi HC, Hardy M, Lee YC, Fezei T: Novel approach to the study of the antigenicities and receptor functions of carbohydrate chains of glycoproteins. Biochem Biophys Res Commun 1985, 132:474-480.
Current Opinion in Biotechnology 2003, 14:119±125