Analysis of protein interaction networks using mass spectrometry compatible techniques

Analysis of protein interaction networks using mass spectrometry compatible techniques

Analytica Chimica Acta 564 (2006) 10–18 Review Analysis of protein interaction networks using mass spectrometry compatible techniques Martin Ethier ...

267KB Sizes 1 Downloads 122 Views

Analytica Chimica Acta 564 (2006) 10–18

Review

Analysis of protein interaction networks using mass spectrometry compatible techniques Martin Ethier 1 , Jean-Philippe Lambert 1 , Julian Vasilescu 1 , Daniel Figeys ∗ The Ottawa Institute of Systems Biology, University of Ottawa, 451 Smyth Road, Ottawa, Ont., Canada K1H 8M5 Received 20 September 2005; received in revised form 7 December 2005; accepted 12 December 2005 Available online 23 January 2006

Abstract The ability to map protein–protein interactions has grown tremendously over the last few years, making it possible to envision the mapping of whole or targeted protein interaction networks and to elucidate their temporal dynamics. The use of mass spectrometry for the study of protein complexes has proven to be an invaluable tool due to its ability to unambiguously identify proteins from a variety of biological samples. Furthermore, when affinity purification is combined with mass spectrometry analysis, the identification of multimeric protein complexes is greatly facilitated. Here, we review recent developments for the analysis of protein interaction networks by mass spectrometry and discuss the integration of different bioinformatic tools for predicting, validating, and managing interaction datasets. © 2006 Elsevier B.V. All rights reserved. Keywords: Proteomics; Mass spectrometry; Protein interactions; Immunoprecipitation

Contents 1. 2.

3.

4.

∗ 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Immunoprecipitation approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Affinity purification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1. Epitope tagging approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2. FLAG tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3. c-Myc tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4. Tandem affinity purification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.5. Affinity purification drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Chemical cross-linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1. In vivo cross-linking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2. Cross-linking chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3. Recent applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Interaction prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Interaction validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Interaction management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Corresponding author. Tel.: +1 613 562 5800x8674; fax: +1 613 562 5452. E-mail address: [email protected] (D. Figeys). These authors contributed equally.

0003-2670/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.aca.2005.12.046

11 11 11 12 12 12 12 14 14 14 15 15 16 16 16 16 17 17 17

M. Ethier et al. / Analytica Chimica Acta 564 (2006) 10–18

11

1. Introduction

2. Immunoprecipitation approach

A fundamental characteristic of proteins is that they interact with one another. Such interactions can lead to the creation of larger complexes or to changes in their spatial distribution. The study of protein–protein interactions provides us with a better understanding of the roles of specific proteins and the regulation of cell signaling events. Although well-established techniques such as the yeast two-hybrid [1] screen and LUMIER [2] allow for the identification of novel protein–protein interactions, they are only limited to the identification of binary interactions, which limits their usefulness for high-throughput analysis of large interaction networks. Fig. 1 illustrates several approaches for the analysis of multimeric protein complexes. One or more protein(s) of interest can be used as a “bait” for an affinity purification step with or without cross-linking to identify potential interaction partners or “preys” which, in turn, may lead to novel insights into protein function. Different bioinformatic tools can also be used to predict protein function or interaction partners. The results obtained from these experiments can be verified either experimentally or using other computational approaches. Results are then stored and browsed using databases and are visually represented using graphical software applications. In the following sections, we will discuss different experimental methods that are available for the analysis of protein interaction networks including affinity purification coupled with mass spectrometry (Sections 1 and 2) and bioinformatics tools that are available for protein–protein interaction prediction, validation, and management (Section 3). Many comprehensive reviews on the basic principles of mass spectrometry are already available [3–6].

A major obstacle for the identification of native protein complexes by mass spectrometry is the isolation of sufficient amounts of purified material from cells. The traditional approach for the isolation of protein complexes employs antibodies that are directed against a protein of interest, the “bait”, to enable the isolation of interaction partners, the “preys”. This method has been successfully used for many years due to its relative simplicity and fast implementation. A typical immunoprecipitation experiment involves several steps: cell culture, cell lysis, pre-clearing of the whole cell lysate, binding the antibody to bait, washing, and elution of the protein complex of interest. This approach continues to be used and is highlighted in recent publications [7,8]. One of the major advantages of this approach is its ability to study protein interactions in their native environment and concentration, thus reducing the possibility of experimental artifacts. Both monoclonal and polyclonal antibodies have been used with success. In general, monoclonal antibodies are preferred for immunoprecipitation experiments as they generally have better specificity than polyclonal antibodies; see the following detailed review of antibodies [9]. However, polyclonal antibodies are also widely used since they are more easily produced. 2.1. Affinity purification Traditionally, affinity purifications have been performed using antibodies directed against a specific protein. In this approach, the experimental conditions must be optimized for every antibody–protein pair studied, which can be a long and

Fig. 1. Schematic representation of protein–protein interaction exploration. Blue: experimental. Red: bioinformatics. Purple: both. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

12

M. Ethier et al. / Analytica Chimica Acta 564 (2006) 10–18

costly process. Since large-scale studies are limited by this approach, the development of standardized affinity purification protocols that are based on the addition of specific tag(s) to the bait proteins has been necessary. We review some of these approaches below. 2.1.1. Epitope tagging approach A method that has gained in popularity in recent years is the use of epitope tags for the isolation of protein complexes. This method relies on the addition of an epitope tag at either end of a protein and its subsequent expression in appropriate cells. To do so, the full length clone of the bait of interest is inserted in an appropriate vector with a suitable promoter and a sequence coding for the epitope tag. The vector is then transfected in appropriate cells and over-expressed. A variety of epitope tags have been developed and successfully used to study protein–protein interactions. One of the advantages of epitope tagging is that the method is well suited to large-scale studies due to the consistency of the protocol (Fig. 2a). The protocol can be optimized on a limited number of samples and then applied to a larger sample set. This can be done since the interaction between the epitope and antibody is constant even if the baits are different. In addition, the majority of the materials required for epitope tagging (i.e. antibodies, constructs, and buffer solutions) are readily available from commercial sources, thus enabling even better reproducibility. 2.1.2. FLAG tag The FLAG tag is a hydrophobic octapeptide that was developed by Hopp et al. in 1988 [10] that can be located at either the C- or N-terminus of a protein of interest for immunoprecipitation [11]. The FLAG epitope is recognized by different anti-FLAG antibodies (M1, M2, and M5 antibodies) enabling various binding conditions to be used in the immunoprecipitation procedure [12]. It is still unclear why calcium plays a role in FLAG binding by certain anti-FLAG antibodies and not others [13,14]. Nevertheless, proteins can be eluted with a calcium chelator, low pH or the competing FLAG peptide depending on the antibody chosen. Thus, the purification protocol can be tailored to the complex of interest, enabling improved recovery and purity. An elegant example of the FLAG tag system in combination with mass spectrometry was to systematically study protein–protein interactions in the budding yeast Saccharomyces cerevisiae was published by Ho et al. [15]. Using 725 bait proteins of various cellular functions, they were able to detect over 3000 associated proteins covering an estimated 25% of the yeast proteome. Following filtration of the dataset using stringent parameters, 1578 interactions were reported. This landmark study enabled the assignment of functions to hundreds of proteins and in particular to 531 proteins that were previously unobserved. Various pathways were studied in greater details. The DNA damage response (DDR) pathway, for example, was studied with 86 bait proteins and enabled confirmation of the roles of numerous proteins. In addition, many novel interactions were identified, such as Rfc4–Ddc1, thus improving the overall understanding of yeast response to DNA damage.

2.1.3. c-Myc tag The c-myc tag was created from the human c-myc gene product and is composed of 10 amino acids. This epitope tag is recognized by the anti-c-myc monoclonal antibody 9E10 [16]. The c-myc tag can be fused at the C- or N-terminal of proteins with equally good results. The purification protocol involves binding the tagged protein to 9E10 antibody coupled beads, washing the bound protein complexes and eluting with a low pH buffer. A recent example of the use of c-myc tag for the study of protein–protein interactions in budding yeast was reported by Seol et al. [17]. The c-myc9 tag was fused to cdc53 and Skp1 and an affinity purification step coupled to mass spectrometry was performed. Fifteen different interaction partners were identified, including Rav1 and Rav2. To further characterize these proteins, c-myc9 tagged Rav1 and Rav2 were prepared and additional affinity purifications were performed. Three subunits of the V-ATPase complex were identified by mass spectrometry, proposing a role for Skp1, Rav1, and Rav2 in the regulation of V-ATPase, a complex implicated in tumor metastasis and multidrug resistance. 2.1.4. Tandem affinity purification Tandem affinity purification (TAP) was introduced by Rigaut et al. to enable the isolation of interacting proteins with reduced background [18]. The method relies on the same principle as the epitope tagging approach but uses two successive tags instead of one (Fig. 2c). As in epitope tagging, the first step is the fusion of the clone of interest to the TAP tag and its subsequent expression in the appropriate cells and/or organisms. After cell lysis, the resulting mixture is purified with a column containing IgG beads. The protein A tag portion of the TAP tag selectively binds the IgG and enables most contaminants to be removed with washing. The protein complexes are then eluted from the IgG column using Tobacco etch virus (TEV) protease which cleaves a specific site composed of seven amino acids in the TAP tag (Fig. 2c). The collected mixture is further purified using calmodulin beads to which the calmodulin-binding peptide (CBP) portion of the TAP tag binds. After further washing, the protein complexes are eluted with a calcium chelator, such as EGTA. Gavin et al. reported the first large-scale use of the TAP tag in combination with mass spectrometry for the analysis of more than 1700 proteins in the budding yeast S. cerevisiae [19]. Over 500 protein assemblies were successfully purified and analyzed with a success rate of approximately 80%. This enabled novel functions to be proposed for more than 300 proteins. Thirteen purifications were performed in duplicate and it was determined that about 70% of interaction partners could be consistently identified. Discrepancies among experiments were due to the loss of weakly and/or transient interactions and to co-purified contaminants. An example of a protein interaction network that was successfully studied is the polyadenylation machinery in which affinity purifications with 3 baits enabled 12 known interaction partners to be identified and yielded seven novel interactions. Recently, a study of the pro-inflammatory cytokine tumor necrosis factor (TNF)-␣ using the TAP tag in HEK293 cells [20] was published. The authors used 32 known or postulated TAP tagged components of the TNF-␣/NF-␬B pathway, and

M. Ethier et al. / Analytica Chimica Acta 564 (2006) 10–18

13

Fig. 2. Different strategies for affinity purification and protein–protein interaction identification: (a) epitope tagging, (b) chemical cross-linking, and (c) TAP tag approach. Cross-linking allows for more stringent washing preventing the loss of transient protein complexes.

identified 680 proteins by mass spectrometry. One hundred and seventy-one known interactions were reported, representing about 70% of the known interactions occurring in this pathway. After stringent filtering of the raw dataset, the authors obtained 131 high-confidence interaction partners with 80 new potential interactions. Using the data gathered, TNF-␣/NF-␬B pathway was further studied by altering the gene expression profile of 28 components of the pathway using RNA interference (RNAi).

Results of this study provided a better understanding of the cell signaling events occurring as a result of TNF-␣ stimulation. Knuesel et al. proposed an alternative TAP tag approach to purify protein complexes in mammalian cells [21]. They engineered a novel retroviral expression vector containing two protein A sequences, two copies of the TEV protease recognition sequence and one FLAG tag sequence which could be fused at the N-terminal of any protein sequence. The authors replaced

14

M. Ethier et al. / Analytica Chimica Acta 564 (2006) 10–18

the classical CBP tag with a FLAG tag since they observed that many mammalian proteins interacted non-specifically with the calmodulin beads. Using their new version of the TAP tag, the authors studied Smad3 interactions and identified HSP70 as a novel interaction partner. Another variation of the TAP tag was recently introduced by Drakas et al. to improve the recovery of proteins from mammalian cells growing in monolayers [22]. The authors added a biotin tag (BT) at the N-terminus of the protein of interest rather than the CBP tag [23]. Another vector containing Escherichia coli Bir A was used in combination with the TAP tag to enable the biotinylation of the BT sequence. The biotinylated tag interacts strongly with streptavidin and enabled higher yields of protein complexes to be obtained. Using this protocol, Drakas et al. studied the interaction partners of the insulin receptor substrate1 (IRS-1) in R+ cell lines generated from mouse embryos where the IRF-1 was replaced with human IGF-1R. By combining their modified TAP tag with tandem mass spectrometry (LC–MS/MS) analysis, they were able to identify numerous IRS-1 interaction partners. It has been estimated that the use of this modified TAP tag increased the number of proteins recovered by a factor of 3. An improved vector for the modified TAP tag where both BT and Bir A are present in a single vector was also described [22]. 2.1.5. Affinity purification drawbacks An issue that is too often overlooked is the lack of antibody specificity. Even the most efficient monoclonal antibodies crossreact to some extent with background proteins, thus producing false-positives. In addition, abundant “sticky” proteins often interact with the solid support used in the purification procedure to create additional false-positives. Stringent washing conditions usually reduce the amount of background proteins observed but this may also remove interaction partners that bind with low affinity to the bait. Simple repetition of the immunoprecipitation experiments can be used to increase experimental confidence and in large-scale studies, clustering techniques may be used to identify and remove redundant proteins that are likely to be background contaminants. Another issue with affinity purification is the possibility of interference from the epitope tag itself. This is mostly an issue in experiments using larger tags, such as TAP or GFP, but this issue cannot be dismissed entirely for smaller tags. Methods employing isotopic labeling in conjunction with affinity purification have recently emerged and may overcome the issue of “dirty” pull-downs. For example, by combining single step affinity purification, the isotope-coded affinity tag (ICAT) [24] method, and mass spectrometry, Ranish et al. identified interaction partners of the large RNA polymerase II pre-initiation complex with high confidence [25]. Affinity purifications were performed using two strains of budding yeast: (1) a wild-type strain (sample) and (2) a strain containing a temperature sensitive mutant of the TATA binding protein (control). Proteins were labeled with the ICAT reagent and lysates were combined in a 1:1 ratio. Following protein digestion, peptide purification, and mass spectrometry analysis, wild-type specific interaction partners of the large RNA polymerase II preinitiation complex were identified.

Blagoev et al. [26] reported a similar isotope labeling strategy by analyzing proteins that bound the SH2 domain of Grb2 in EGF-stimulated HeLa cells. Cells were grown in media containing light (12 C6) or heavy (13 C6) versions of arginine using the stable isotopic amino acid in cell culture (SILAC) technique [27]. Cell lysates were then prepared from EGF-stimulated (13 C6 arginine) and unstimulated (12 C6 arginine) cells and were combined in a 1:1 ratio. Affinity purifications using GST beads fused to the SH2 domain from human Grb2 were performed and after differential LC–MS/MS analysis, interaction partners of the SH2 domain were identified from enriched proteins in the EGF-stimulated sample. Analogous methods were recently used by two groups studying the interaction partners of MyD88 [28] and the 26S proteasome [29]. In both cases, the use of isotopic labeling enabled interaction partners to be distinguished from background contaminants. 2.2. Chemical cross-linking Chemical cross-linking approaches for studying protein complexes have grown in popularity over recent years because of the challenges posed by the increasingly larger complexes that are being discovered (e.g. RNA polymerase II holoenzyme [30], the anaphase promoting complex [31], the spliceosome [32], and the proteasome [33]) and because technological improvements in analytical methods such as mass spectrometry now allow for better interpretation of chemically modified proteins [34,35]. Several methods for studying protein complexes by mass spectrometry using bi- and trifunctional cross-linkers are described in the literature [36–38]. For this review, however, we will focus solely on methods employing formaldehyde and glutaraldehyde because only these cross-linkers permit the study of protein complexes under physiological conditions through a process commonly referred to as “in vivo crosslinking”. 2.2.1. In vivo cross-linking Several features make formaldehyde and glutaraldehyde particularly useful for studying native protein complexes. First, the cross-links that are produced represent a very short distance (see below) such that proteins that are cross-linked must be in close physical proximity. Second, they are known to rapidly penetrate the cell membrane and are rather non-specific, making them useful for detecting a wide range of interactions. Third, they are known to inactivate enzymes almost immediately upon addition to growing cells, suggesting that they provide a snapshot of interactions at the time of addition [39]. Fourth, once the cross-linking reaction is complete, the products can be subjected to non-physiological conditions (i.e. during an affinity purification step) and still maintain structural integrity. Finally, the cross-links are reversible, enabling subsequent analysis of the individual components of the cross-linked complex by mass spectrometry [40]. The use of formaldehyde or glutaraldehyde cross-linking prior to affinity purification and mass spectrometry analysis permits the study of transient protein complexes that would otherwise escape analysis. This is because transient complexes,

M. Ethier et al. / Analytica Chimica Acta 564 (2006) 10–18

which are characterized by high dissociation constants, are lost as a result of the conditions used to reduce background proteins and contaminants during an affinity purification step. These washing conditions are often quite stringent, employing either high salt or detergent concentrations to be effective. In the cases where milder conditions are used, repeated rounds of washing may still lead to the loss of components that are not tightly bound to one another (see Fig. 2b). 2.2.2. Cross-linking chemistry Aldehyde-based cross-linkers are known to react with amino acid side chains of proteins that contain primary amino groups (i.e. lysine and arginine) and thiol groups (i.e. cysteine) to form a methylol derivative. In the case of primary amino groups, the methylol group then partially undergoes a condensation reaction to an imine, which is also known as Schiff-base. Subsequently, the Schiff-base can form cross-links with several amino acid residues via the formation of a stable methylene bridge that spans ˚ [41,42]. a distance of roughly 2–3 A A recent study using a series of model peptides demonstrated that formaldehyde reacts with the amino group of the N-terminal amino acid and the side chains of lysine, arginine, cysteine, and histidine [42]. Depending on the peptide sequence, three types of modifications were observed: (1) methylol groups, (2) Schiffbases, and (3) methylene bridges. Although the formation of methylol groups and Schiff-bases is reversible, the authors were able to show evidence for both modifications in several peptides by LC–MS. Results from this study also confirmed that only primary amino groups can form cross-links in a second step with arginine, asparagine, glutamine, tryptophan, histidine, and tyrosine residues. Although informative, cross-linking studies involving free amino acids and peptides still cannot accurately predict formaldehyde-induced modifications in proteins or the extent to which intermolecular cross-links may form. This is because the formation of cross-links between proteins is influenced by additional factors, such as pH, the position and local environment of reactive amino acid residues, the rate of a particular cross-link reaction, the reactant concentrations, and other components present in the reaction solution. 2.2.3. Recent applications Methods involving formaldehyde cross-linking and mass spectrometry have been described for the study of cytosolic and membrane protein complexes in various cell types. To analyze the interactions of the P1 adhesin protein in Mycoplasma pneumoniae, Layh-Schmitt et al. [43] treated cells with formaldehyde and successfully isolated a cross-linked P1 adhesin complex by immunoaffinity chromatography. After reversing the crosslinks, the identity of several proteins that cross-linked to P1 adhesin was confirmed by Western blotting using antibodies against known interaction partners. These include the adhesionrelated 30 kDa protein, the P40 and P90 membrane proteins, the cytoskeletal P65 protein, and two cytoskeleton-forming proteins HMW1 and HMW3. Proteins cross-linked to P1 adhesin were also separated by SDS-PAGE and subjected to MALDI-TOF MS analysis to confirm the identity of the six proteins. Further

15

analysis of the cross-linked complex revealed two additional proteins, the chaperone DnaK and the E1-␣ subunit of pyruvate dehydrogenase. To identify the interaction partners of the cellular prior protein (PrPC ), Schmitt-Ulms et al. [44] subjected mouse neuroblastoma cells to mild formaldehyde treatment. Western blotting using recombinant Fabs against PrPC revealed the presence of high molecular weight protein complexes in the 200 kDa range after cross-linking. Immunoprecipitation of the cross-linked complexes followed by LC–MS/MS analysis identified three splice variants of the neural cell adhesion molecule (N-CAM). With the use of a N-CAM specific peptide library, the PrP-binding site was then localized within two consecutive fibronectin modules found in proximity of the membrane-attachment site of N-CAM. In another study related to PrP, Schmitt-Ulms et al. [45] described a method that combines transcardiac perfusion and formaldehyde cross-linking for the study of protein complexes in living tissues. Using mouse brains as the target tissue, this method enabled the purification and identification of more than 20 membrane proteins residing in the vicinity of PrPC , many of which have been implicated in cell adhesion/neuritic outgrowth and harbor fibronectin-like motifs. To validate the protocol, high salt and detergent concentrations were used during the purification step of a low abundance enzymatic complex known as ␥secretase. Despite the stringent washing conditions, LC–MS/MS analysis was still able to confirm the presence of several of its known components, including aph-1, presenilin-1, and nicastrin. Hall and Struhl used mild formaldehyde cross-linking coupled to immunoprecipitation to address whether the TATAbinding protein TBP, the transcription factor TFIIB, and the SAGA histone acetylase complex interact with the VP16 transcriptional activator protein in S. cerevisiae [39]. Immunoprecipitations were performed on cross-linked samples from strains expressing the Gal-VP16 plasmid as well as epitope tagged versions of TBP, TFIIB, TFIIA, and 24 other proteins, including numerous components of the RNA polymerase II holoenzyme complex. Western blot analysis was then used to confirm the in vivo interactions using antibodies that recognize the tagged proteins. Vasilescu et al. described a generic approach for studying protein complexes in mammalian cells using a combination of formaldehyde cross-linking, epitope tagging, immunoaffinity chromatography, and mass spectrometry [40]. Application of this method enabled the identification of numerous proteins that co-purified with a Myc-tagged Ras GTPase after crosslinking. Co-immunoprecipitations and Western blot analysis were used to confirm a number of these interactions. Among these, a RasGAP-related protein known as IQGAP1 was shown to be a novel interaction partner of M-Ras. This method is expected to aid the study of many protein complexes and will make it possible to examine individual subunits and to approximate the relative molar amounts of proteins that immunopurify with a given protein of interest. Furthermore, applications of this method in combination with isotopic labeling techniques may provide further insights into the regulation and dynamics of protein complexes in various cell types and under different physiological conditions.

16

M. Ethier et al. / Analytica Chimica Acta 564 (2006) 10–18

In a recent study, Guerrero et al. utilized such an approach to study the interaction partners of the 26S proteasome in yeast [29]. Affinity purification after formaldehyde cross-linking was combined with SILAC and mass spectrometry analysis to identify and quantify specific interactions. Using this method, the full composition of the yeast 26S proteasome complex was identified as well as two known ubiquitin receptors, Rad23 and Dsk2. In addition, a total of 64 potential proteasome interacting proteins, of which 42 were novel interactions, were reported. 3. Bioinformatics As more and more experimental procedures are being developed for large-scale studies of protein–protein interactions, there is an increasing need for tools that are able to predict, validate, and manage the large interaction datasets that are produced. Predictive tools can be used to direct and establish more focused experiments while validation can be used to extract meaningful interactions from large datasets and increase the confidence of the results obtained. Management of the results can be used to facilitate access to the data and to allow for meaningful representation of the interaction networks identified. 3.1. Interaction prediction Protein–protein interactions can be predicted with different amounts of certainty depending on the type of starting information that is available, including 3D or X-ray structures, protein similarity, or primary sequence alone. Several methods have been developed for prediction of protein–protein interactions and they can be divided in three categories: (1) sequencebased [46]; (2) evolutionary-based [47,48]; (3) structure-based [49]. Using a sequence-based technique, Ofran and Rost [46] studied known interaction partners in the non-redundant PDB database [50] to define the environment around contact residues. Unknown structures were aligned to these contact residues to identify probable interactions. It was determined that 80% of contact residues had five or more other interacting residues in their vicinity. Consequently, this enabled a slight improvement over purely random prediction tools. Using an evolutionary-based technique, Lichtarge and Sowa [47] assumed that protein function would be conserved throughout evolution because interaction sites are highly conserved. By comparing different members of a given protein family, they were able to identify conserved residues in 80% of the proteins studied. Wodak and Mendez [49] evaluated the results of the third edition of the Critical Assessment of Predicted Interactions (CAPRI) contest in which laboratories were provided with the individual 3D X-ray structures for proteins of interacting pairs and were asked to find the interaction sites. Laboratories that allowed for even limited conformational changes achieved greater results than the laboratories that did not. All three methods will gain from increasing the size of available protein information as their algorithms are based on observed trends. The more proteins that are available, the better the statistics. As the quality of the predictions increases, they could be used to direct focused experiments and to vali-

date experimental results. For example, Friedhoff [35] recently demonstrated a strategy in which the results obtained from chemical cross-linking were further analyzed using bioinformatics tools. 3.2. Interaction validation Another dimension of bioinformatics for the study of protein interactions is the validation of experimental results. This is necessary to avoid false-positives but also to remove any inevitable duplication of information in existing databases. This is a significant issue with human protein databases as no nomenclature convention exists. Since an estimated 375,000 human protein interactions are estimated, use of different names from the literature may complicate the data unnecessarily. As a result, scientists may be reproducing results without even realizing it. Ramani et al. [51] developed a bioinformatics tool to identify existing interactions described in Medline abstracts. They discarded redundant information by linking each protein to its gene ID and also predicted new interactions by identifying protein pairs that were frequently cited with each other in the literature. Based on their results, it was estimated that only 10% of the expected human protein interactions were cited in the literature. Lappe and Holm recently suggested a strategy for optimal coverage of interactions using affinity purification techniques [52]. It consists of selecting appropriate baits based on the results of previous pull-downs. For example, proteins detected in multiple pull-downs are more likely to be highly connected to other proteins and may enable the detection of numerous interactions. Butland et al. [53] studied the E. coli interactome using affinity purification in combination with mass spectrometry. They used the results to predict and validate interactions in other genomes based on the fact that the proteins are involved in highly conserved biochemical processes. This method was also used to identify new interactions. Park et al. [54] predicted interactions between protein families for 146 species. They studied highly conserved interactions related to basic biochemical processes such as protein translation, DNA binding, and ATP metabolism. von Mering et al. analyzed large-scale datasets of protein– protein interactions produced from various mass spectrometrybased techniques. Among their results, they demonstrate that each technique is biased towards different functional classes of proteins thus showing that each technique complements each other [55]. 3.3. Interaction management Results from interaction mapping experiments may be stored in different databases, most of which are freely available on the internet. The available databases have been reviewed elsewhere [56,57]. A summary of available databases and bioinformatics tools for protein interactions is shown in Table 1. Many different tools have been developed to visualize results and to extract meaningful information out of these databases. Breitkreutz et al. [58] developed a tool called Osprey to visualize extracted interaction networks. An interesting feature permits interaction networks be color-coded according to Go annotations

M. Ethier et al. / Analytica Chimica Acta 564 (2006) 10–18

17

Table 1 List of available bioinformatic tools for protein interactions Name

Description

Internet address

ADVICE [60] BIND [61] ClusPro [62,63] DIP [64] GRID [65] HPID HPRD [66] MINT [67] MPPI [68] STRING [69]

Evolutionary prediction Experimentally derived interactions Structural-based prediction Experimentally derived interactions Experimentally derived interactions BIND, DIP, and HPRD results Experimentally derived interactions Experimentally derived interactions Experimentally derived interactions Experimentally derived interactions and evolutionary prediction

http://advice.i2r.a-star.edu.sg/ http://www.bind.ca/ http://nrc.bu.edu/cluster/ http://dip.doe-mbi.ucla.edu/ http://biodata.mshri.on.ca/grid/servlet/Index http://wilab.inha.ac.kr/hpid/ http://www.hprd.org/ http://mint.bio.uniroma2.it/mint/ http://mips.gsf.de/proj/ppi/ http://string.embl.de/

[59] or the experimental source of the results. A search engine allows the user to browse proteins or filter through interactions according to various parameters. STRING is another visualization tool that automatically builds interaction networks around a given protein. The user can move proteins around or let STRING visually optimize the representation by using an analogy to string and beads. The user can also extend the network around proteins already included. HPID, HPRD, and MINT are databases that also have visualization capabilities. An important limitation in the current way of accessing protein interaction datasets is the lack of linking through NCBI. For most users, the interaction databases remain obscure and significant improvements would be achieved if interactions databases were indexed in NCBI. Inconsistencies in data extraction that lead to the removal of subsets of data from published experiments are also problematic. This is because the only other way of ever finding these interactions is to go back and perform a manual search through the literature. Finally, protein interaction websites are often not user-friendly and casual users who are interested in a specific protein or set of proteins may be unable to retrieve the desired information. 4. Conclusion Over the last five years, the mapping of protein–protein interactions has matured through the introduction of different techniques as well as enhanced bioinformatics tools. Large-scale mapping of protein–protein interactions will be a great source of novel material for different fields of research. It will be important to improve the quality of the information as well as the interfaces to access this information. In particular, the low degree of overlap in datasets generated by different techniques is problematic and will need to be addressed. As well, current large-scale experiments lack information regarding the dynamics of interactions. In our opinion, mapping as well as the study of the dynamics of protein interactions, complexes, and pathways in the proper cellular context will provide an improved understanding of biological systems. Acknowledgements D.F. would like to acknowledge financial support from the Canadian Foundation for Innovation, the Canada Research Chair

Program, the National Research Council of Canada, the University of Ottawa, MDS Inc., and the Ontario Genomics Institute. References [1] C.T. Chien, P.L. Bartel, R. Sternglanz, S. Fields, Proc. Natl. Acad. Sci. U.S.A. 88 (1991) 9578. [2] M. Barrios-Rodiles, K.R. Brown, B. Ozdamar, R. Bose, Z. Liu, R.S. Donovan, F. Shinjo, Y. Liu, J. Dembowy, I.W. Taylor, V. Luga, N. Przulj, M. Robinson, H. Suzuki, Y. Hayashizaki, I. Jurisica, J.L. Wrana, Science 307 (2005) 1621. [3] H. Steen, M. Mann, Nat. Rev. Mol. Cell Biol. 5 (2004) 699. [4] R. Aebersold, D.R. Goodlett, Chem. Rev. 101 (2001) 269. [5] R. Aebersold, M. Mann, Nature 422 (2003) 198. [6] M. Mann, R.C. Hendrickson, A. Pandey, Annu. Rev. Biochem. 70 (2001) 437. [7] J.J. Hill, M.V. Davies, A.A. Pearson, J.H. Wang, R.M. Hewick, N.M. Wolfman, Y. Qiu, J. Biol. Chem. 277 (2002) 40735. [8] J. Eilbracht, S. Kneissel, A. Hofmann, M.S. Schmidt-Zachmann, Eur. J. Cell Biol. 84 (2005) 279. [9] N.S. Lipman, L.R. Jackson, L.J. Trudel, F. Weis-Garcia, ILAR J. 46 (2005) 258. [10] T.P. Hopp, K.S. Pricket, V.L. Price, R.T. Libby, C.J. March, D.P. Ceretti, D.L. Urdal, P.J. Conlon, Biotechnol. (N. Y.) 6 (1988) 1204. [11] A. Einhauer, A. Jungbauer, J. Biochem. Biophys. Methods 49 (2001) 455. [12] K. Terpe, Appl. Microbiol. Biotechnol. 60 (2003) 523. [13] T.P. Hopp, B. Gallis, K.S. Prickett, Mol. Immunol. 33 (1996) 601. [14] A. Einhauer, A. Jungbauer, 20th International Symposium on the Separation of Proteins Peptides, and Polynucleotides (ISPPP), 2000. [15] Y. Ho, A. Gruhler, A. Heilbut, G.D. Bader, L. Moore, S.L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Yang, C. Wolting, I. Donaldson, S. Schandorff, J. Shewnarane, M. Vo, J. Taggart, M. Goudreault, B. Muskat, C. Alfarano, D. Dewar, Z. Lin, K. Michalickova, A.R. Willems, H. Sassi, P.A. Nielsen, K.J. Rasmussen, J.R. Andersen, L.E. Johansen, L.H. Hansen, H. Jespersen, A. Podtelejnikov, E. Nielsen, J. Crawford, V. Poulsen, B.D. Sorensen, J. Matthiesen, R.C. Hendrickson, F. Gleeson, T. Pawson, M.F. Moran, D. Durocher, M. Mann, C.W. Hogue, D. Figeys, M. Tyers, Nature 415 (2002) 180. [16] G.I. Evan, G.K. Lewis, G. Ramsay, J.M. Bishop, Mol. Cell. Biol. 5 (1985) 3610. [17] J.H. Seol, A. Shevchenko, A. Shevchenko, R.J. Deshaies, Nat. Cell Biol. 3 (2001) 384. [18] G. Rigaut, A. Shevchenko, B. Rutz, M. Wilm, M. Mann, B. Seraphin, Nat. Biotechnol. 17 (1999) 1030. [19] A.C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J.M. Rick, A.M. Michon, C.M. Cruciat, M. Remor, C. Hofert, M. Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M. Hudak, D. Dickson, T. Rudi, V. Gnau, A. Bauch, S. Bastuck, B.

18

[20]

[21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42]

[43] [44]

M. Ethier et al. / Analytica Chimica Acta 564 (2006) 10–18 Huhse, C. Leutwein, M.A. Heurtier, R.R. Copley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M. Raida, T. Bouwmeester, P. Bork, B. Seraphin, B. Kuster, G. Neubauer, G. Superti-Furga, Nature 415 (2002) 141. T. Bouwmeester, A. Bauch, H. Ruffner, P.O. Angrand, G. Bergamini, K. Croughton, C. Cruciat, D. Eberhard, J. Gagneur, S. Ghidelli, C. Hopf, B. Huhse, R. Mangano, A.M. Michon, M. Schirle, J. Schlegl, M. Schwab, M.A. Stein, A. Bauer, G. Casari, G. Drewes, A.C. Gavin, D.B. Jackson, G. Joberty, G. Neubauer, J. Rick, B. Kuster, G. Superti-Furga, Nat. Cell Biol. 6 (2004) 97. M. Knuesel, Y. Wan, Z. Xiao, E. Holinger, N. Lowe, W. Wang, X. Liu, Mol. Cell. Proteomics 2 (2003) 1225. R. Drakas, M. Prisco, R. Baserga, Proteomics 5 (2005) 132. P.J. Schatz, Biotechnol. (N. Y.) 11 (1993) 1138. S.P. Gygi, B. Rist, S.A. Gerber, F. Turecek, M.H. Gelb, R. Aebersold, Nat. Biotechnol. 17 (1999) 994. J.A. Ranish, E.C. Yi, D.M. Leslie, S.O. Purvine, D.R. Goodlett, J. Eng, R. Aebersold, Nat. Genet. 33 (2003) 349. B. Blagoev, I. Kratchmarova, S.E. Ong, M. Nielsen, L.J. Foster, M. Mann, Nat. Biotechnol. 21 (2003) 315. S.E. Ong, B. Blagoev, I. Kratchmarova, D.B. Kristensen, H. Steen, A. Pandey, M. Mann, Mol. Cell. Proteomics 1 (2002) 376. T. Wang, S. Gu, T. Ronni, Y.C. Du, X. Chen, J. Proteome Res. 4 (2005) 941. C. Guerrero, C. Tagwerker, P. Kaiser, L. Huang, Mol. Cell. Proteomics (2005), published online, November 10, 2005. Y. Li, S. Bjorklund, Y.J. Kim, R.D. Kornberg, Methods Enzymol. 273 (1996) 172. W. Zachariae, A. Shevchenko, P.D. Andrews, R. Ciosk, M. Galova, M.J. Stark, M. Mann, K. Nasmyth, Science 279 (1998) 1216. G. Neubauer, A. King, J. Rappsilber, C. Calvio, M. Watson, P. Ajuh, J. Sleeman, A. Lamond, M. Mann, Nat. Genet. 20 (1998) 46. S.J. Russell, S.H. Reed, W. Huang, E.C. Friedberg, S.A. Johnston, Mol. Cell 3 (1999) 687. D.A. Fancy, Curr. Opin. Chem. Biol. 4 (2000) 28. P. Friedhoff, Anal. Bioanal. Chem. 381 (2005) 78. D.R. Muller, P. Schindler, H. Towbin, U. Wirth, H. Voshol, S. Hoving, M.O. Steinmetz, Anal. Chem. 73 (2001) 1927. M. Trester-Zedlitz, K. Kamada, S.K. Burley, D. Fenyo, B.T. Chait, T.W. Muir, J. Am. Chem. Soc. 125 (2003) 2416. X. Tang, G.R. Munske, W.F. Siems, J.E. Bruce, Anal. Chem. 77 (2005) 311. D.B. Hall, K. Struhl, J. Biol. Chem. 277 (2002) 46043. J. Vasilescu, X. Guo, J. Kast, Proteomics 4 (2004) 3845. V. Orlando, H. Strutt, R. Paro, Methods 11 (1997) 205. B. Metz, G.F. Kersten, P. Hoogerhout, H.F. Brugghe, H.A. Timmermans, A. de Jong, H. Meiring, J. ten Hove, W.E. Hennink, D.J. Crommelin, W. Jiskoot, J. Biol. Chem. 279 (2004) 6235. G. Layh-Schmitt, A. Podtelejnikov, M. Mann, Microbiology 146 (Pt 3) (2000) 741. G. Schmitt-Ulms, G. Legname, M.A. Baldwin, H.L. Ball, N. Bradon, P.J. Bosque, K.L. Crossin, G.M. Edelman, S.J. DeArmond, F.E. Cohen, S.B. Prusiner, J. Mol. Biol. 314 (2001) 1209.

[45] G. Schmitt-Ulms, K. Hansen, J. Liu, C. Cowdrey, J. Yang, S.J. DeArmond, F.E. Cohen, S.B. Prusiner, M.A. Baldwin, Nat. Biotechnol. 22 (2004) 724. [46] Y. Ofran, B. Rost, FEBS Lett. 544 (2003) 236. [47] O. Lichtarge, M.E. Sowa, Curr. Opin. Struct. Biol. 12 (2002) 21. [48] F. Glaser, T. Pupko, I. Paz, R.E. Bell, D. Bechor-Shental, E. Martz, N. Ben-Tal, Bioinformatics 19 (2003) 163. [49] S.J. Wodak, R. Mendez, Curr. Opin. Struct. Biol. 14 (2004) 242. [50] Y. Ofran, B. Rost, J. Mol. Biol. 325 (2003) 377. [51] A.K. Ramani, R.C. Bunescu, R.J. Mooney, E.M. Marcotte, Genome Biol. 6 (2005) R40. [52] M. Lappe, L. Holm, Nat. Biotechnol. 22 (2004) 98. [53] G. Butland, J.M. Peregrin-Alvarez, J. Li, W. Yang, X. Yang, V. Canadien, A. Starostine, D. Richards, B. Beattie, N. Krogan, M. Davey, J. Parkinson, J. Greenblatt, A. Emili, Nature 433 (2005) 531. [54] D. Park, S. Lee, D. Bolser, M. Schroeder, M. Lappe, D. Oh, J. Bhak, Bioinformatics 21 (2005) 3234. [55] C. von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, P. Bork, Nature 417 (2002) 399. [56] L. Salwinski, D. Eisenberg, Curr. Opin. Struct. Biol. 13 (2003) 377. [57] A. Droit, G.G. Poirier, J.M. Hunter, J. Mol. Endocrinol. 34 (2005) 263. [58] B.J. Breitkreutz, C. Stark, M. Tyers, Genome Biol. 4 (2003) R22. [59] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G. Sherlock, Nat. Genet. 25 (2000) 25. [60] S.H. Tan, Z. Zhang, S.K. Ng, Nucleic Acids Res. 32 (2004) W69. [61] G.D. Bader, D. Betel, C.W. Hogue, Nucleic Acids Res. 31 (2003) 248. [62] S.R. Comeau, D.W. Gatchell, S. Vajda, C.J. Camacho, Nucleic Acids Res. 32 (2004) W96. [63] S.R. Comeau, D.W. Gatchell, S. Vajda, C.J. Camacho, Bioinformatics 20 (2004) 45. [64] I. Xenarios, D.W. Rice, L. Salwinski, M.K. Baron, E.M. Marcotte, D. Eisenberg, Nucleic Acids Res. 28 (2000) 289. [65] B.J. Breitkreutz, C. Stark, M. Tyers, Genome Biol. 4 (2003) R23. [66] S. Peri, J.D. Navarro, R. Amanchy, T.Z. Kristiansen, C.K. Jonnalagadda, V. Surendranath, V. Niranjan, B. Muthusamy, T.K. Gandhi, M. Gronborg, N. Ibarrola, N. Deshpande, K. Shanker, H.N. Shivashankar, B.P. Rashmi, M.A. Ramya, Z. Zhao, K.N. Chandrika, N. Padma, H.C. Harsha, A.J. Yatish, M.P. Kavitha, M. Menezes, D.R. Choudhury, S. Suresh, N. Ghosh, R. Saravana, S. Chandran, S. Krishna, M. Joy, S.K. Anand, V. Madavan, A. Joseph, G.W. Wong, W.P. Schiemann, S.N. Constantinescu, L. Huang, R. Khosravi-Far, H. Steen, M. Tewari, S. Ghaffari, G.C. Blobe, C.V. Dang, J.G. Garcia, J. Pevsner, O.N. Jensen, P. Roepstorff, K.S. Deshpande, A.M. Chinnaiyan, A. Hamosh, A. Chakravarti, A. Pandey, Genome Res. 13 (2003) 2363. [67] A. Zanzoni, L. Montecchi-Palazzi, M. Quondam, G. Ausiello, M. Helmer-Citterich, G. Cesareni, FEBS Lett. 513 (2002) 135. [68] P. Pagel, S. Kovac, M. Oesterheld, B. Brauner, I. Dunger-Kaltenbach, G. Frishman, C. Montrone, P. Mark, V. Stumpflen, H.W. Mewes, A. Ruepp, D. Frishman, Bioinformatics 21 (2005) 832. [69] B. Snel, G. Lehmann, P. Bork, M.A. Huynen, Nucleic Acids Res. 28 (2000) 3442.