CHAPTER TWO
Quantitative Proteomics of the E. coli Membranome K.C. Tsolis, A. Economou1 KU Leuven, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, Leuven, Belgium 1 Corresponding author: e-mail address:
[email protected]
Contents 1. Introduction—Pipeline Overview 2. Sample Preparation 2.1 Introduction 2.2 Preparation of IMVs 2.3 MS Sample Preparation 3. Peptide/Protein Identification 3.1 Introduction 3.2 MS Analysis 3.3 Peptide/Protein Identification 4. Quantification 4.1 Introduction 4.2 Spectral-Based Quantification of Membrane Proteins 4.3 Intensity-Based Label-Free Quantification of Membrane Proteins 5. Functional Annotation 6. Conclusion Acknowledgments References
16 17 17 19 20 24 24 24 25 26 26 28 28 32 33 33 34
Abstract Due to their physicochemical properties, membrane protein proteomics analyses often require extensive sample preparation protocols resulting in sample loss and introducing technical variation. Several methods for membrane proteomics have been described, designed to meet the needs of specific sample types and experimental designs. Here, we present a complete membrane proteomics pipeline starting from the membrane sample preparation to the protein identification/quantification and also discuss about annotation of proteomics data. The protocol has been developed using Escherichia coli samples but is directly adaptable to other bacteria including pathogens. We describe a method for the preparation of E. coli inner membrane vesicles (IMVs) central to our pipeline. IMVs are functional membrane vesicles that can also be used for biochemical studies. Next, we propose methods for membrane protein digestion and describe alternative experimental approaches that have been previously tested in our lab.
Methods in Enzymology, Volume 586 ISSN 0076-6879 http://dx.doi.org/10.1016/bs.mie.2016.09.026
#
2017 Elsevier Inc. All rights reserved.
15
16
K.C. Tsolis and A. Economou
We highlight a surface proteolysis protocol for the identification of inner membrane and membrane-bound proteins. This is a simple, fast, and reproducible method for the membrane sample characterization that has been previously used for the E. coli inner membrane proteome characterization (Papanastasiou et al., 2013) and the experimental validation of E. coli membrane proteome (Orfanoudaki & Economou, 2014). It provides a reduced load on MS-time and allows for multiple repeats. Then we discuss membrane protein quantification approaches and tools that can be used for the functional annotation of identified proteins. Overall, membrane proteome quantification can be fast, simplified, and reproducible; however, optimization steps should be performed for a given sample type.
1. INTRODUCTION—PIPELINE OVERVIEW Membrane proteins comprise a significant portion of the total proteome, representing approximately 20–30% of the encoded genes among various organisms (Orfanoudaki & Economou, 2014; Wallin & von Heijne, 1998). Another 12% of the Escherichia coli proteome associate with membranes peripherally docking on membrane proteins and/or lipids (Papanastasiou et al., 2016). In addition, membrane proteins participate in various essential biological functions, including cell signaling, solute transport, protein secretion, cell communication etc., highlighting the importance of this subproteome. Proteomic characterization of membranes is challenging, due to their physicochemical properties, and the incompatibility of various detergents/chemicals with mass spectrometric analysis. In contrast to soluble proteins, analysis of membrane proteins requires the use of detergents or organic solvents to extract them from membranes and remain in-solution surrounded by amphiphilic detergent molecules. This treatment results in laborious protocols with increased sample loss. Several approaches have been developed so far, following either the classical gel-based protocols (Rabilloud, 2009), or the solution-based methods using detergents, organic solvents, or membrane “shaving” (Speers & Wu, 2007; Wu, MacCoss, Howell, & Yates, 2003; Zhang, 2015). In this chapter, we present a complete protocol for MS-based membrane proteomics analysis developed on the E. coli model system, starting from membrane sample preparation steps to data analysis. Our goal is to describe standard procedures for each experimental part that were previously tested in our lab, and propose alternative options. We will start by characterizing the membrane sample preparation procedure, and continue with protein digestion protocols (Section 2). Next, we will list standard LC–MS/MS analysis
Quantitative Proteomics of the E. coli Membranome
17
Fig. 1 Membrane proteomics workflow. Our membrane proteomics workflow can be separated into four blocks: the sample preparation, MS analysis and peptide/protein identification, the quantification and selection of candidate proteins, and the functional annotation of the selected proteins. Representative protocols previously tested in the lab are described in each session, and complementary methodologies are discussed. In addition, we present the analysis of an example dataset starting from the isolation of IMVs to the selection of proteins with differential abundance.
parameters, and software platforms that can be used for the protein identification (Section 3). Then we will introduce concepts of protein quantification and describe representative examples (Section 4). Last, we will present popular bioinformatics tools that can be used for the functional characterization of the proteomics results (Section 5). This pipeline should be widely applicable to any bacterium (Fig. 1).
2. SAMPLE PREPARATION 2.1 Introduction Sample preparation for membrane proteomics can be separated in two parts: (a) the preparation of E. coli cell membranes and (b) the preparation of peptide samples for mass spectrometric analysis. Biochemical fractionation of membranes reduces protein complexity, improving the peptide identification rates and the reproducibility of proteomics results. In addition, extensive washing and/or chemical treatment of the membrane preparations
18
K.C. Tsolis and A. Economou
(e.g., carbonate, urea, salt treatment), removes peripherally associated or cytoplasmic proteins which are identified on the membranes. For the study of the inner membrane proteome of E. coli, we initially prepare inner membrane vesicles (IMVs), which are further treated with a chaotrope (commonly urea) to reduce the amount of peripherally associated proteins if needed (Futai, 1974; Papanastasiou et al., 2013; Pieper et al., 2009). IMVs can then be analyzed with various proteomics workflows, depending on the goals of the study. In this session, we will present (a) the protocol of the preparation of IMVs (Papanastasiou et al., 2013), (b) two sample preparation methods for the proteomics characterization of the membranes, and (c) propose alternative solutions that have been previously tested in our lab (Fig. 2). The first sample preparation method is surface proteolysis of E. coli IMVs that represents a simple and fast experimental approach, requiring low amounts of membrane sample. Modifications of the basic principle of membrane surface “shaving” have been used in the past, optimized for different membrane proteins (Solis & Cordwell, 2011; Speers & Wu, 2007; Wu et al., 2003). In addition, this protocol was also used for the study of the inner membrane proteome of E. coli K-12 (Papanastasiou et al., 2013,
Fig. 2 Sample preparation workflow. E. coli cells are lysed using French press at 16,000 psi. The membrane fraction is isolated and further fractionated across a sucrose gradient to purify inner membranes. Chemical treatment of the membranes reduces the amount of peripherally associated proteins on the IMVs and depending on application is recommended. Different methods can be applied for the digestion of membrane proteins and their preparation for MS analysis. Surface proteolysis in-solution provides a simple, fast, and reproducible approach for the study of membrane and membraneassociated proteins and results in a single MS run. SDS-PAGE combined with in-gel digestion is also a technically simple method for an untargeted membrane characterization. Additional methods combining solubilization with detergents and digestion using the FASP protocol (Wisniewski, Zougman, & Mann, 2009), or solubilization in nonionic detergents, digestion, and OFFGEL fractionation, can also be applied in membrane samples.
Quantitative Proteomics of the E. coli Membranome
19
2016). The second method consists of complete solubilization of cell membranes or IMVs using detergents, separation through one-dimensional SDSPAGE, and in-gel digestion. Although this approach is more elaborate, it is a reliable approach that can increase the peptide coverage of membrane proteins.
2.2 Preparation of IMVs Buffers • Buffer A (50 mM Tris/HCl, pH 8.0; 20% glycerol) • Buffer B (50 mM Tris/HCl, pH 8.0; 50 mM KCl; 5 mM MgCl2) Equipment • French press FA078 (SLM-AMINCO/THERMO), with the 35-mL FA-032 cell pressure cell (use pre-cooled at 4°C). • Beckman Coulter Optima XPN-80 Ultracentrifuge, with a swing-out SW 32Ti rotor and 38.5 mL polypropylene tubes (Cat. 3268233) or 17 mL polypropylene tubes (Cat. 337986), or with a fixe angle Type 45Ti rotor and 70 mL Cat. 355655 polycarbonate tubes. • Beckman Coulter Avanti J-26S XPI, with fixed-angle rotor JLA-8.1000, with 1 L tubes (Cat. A98814). • Dounce homogenizer (15 mL; Cat. 40415; Active Motif, La Hulpe, Belgium). • Lipid-extruder (LiposoFast-Basic; AVESTIN Europe, Mannheim, Germany). Procedure 1. Starting from an overnight bacterial culture, inoculate 15 L of LB with 150 mL the E. coli strain of interest, using the appropriate antibiotics, and growth temperature. 2. Grow cells until OD600 ¼ 1.5 and harvest via centrifugation at 4500 g for 15 min at 4°C, optimally using a high capacity. 3. Resuspend the cell pellet obtained (approximately 40 g of cells) in Buffer A to a final volume of 40 mL and add 5 μg/mL DNAse; 100 μg/mL RNAse, 2.5 mM MgCl2, and 1 mM PMSF. Keep at 4°C. 4. Break cells using a French press at 16,000 psi at 4°C, using a flow rate of 15 drops per minute. Pass cells 4–5 times. 5. Dilute sample solution 4–5 with Buffer A and sediment unbroken cells via centrifugation (3000 g; 10 min; 4°C). 6. Collect the supernatant and sediment, the membranes it contains via ultracentrifugation (100,000 g; 90 min; 4°C) using a high capacity fixed-angle rotor.
20
K.C. Tsolis and A. Economou
7. Resolubilize and homogenize the membrane pellet in 12 mL of Buffer A, using a Dounce homogenizer. 8. Fractionate membranes on a five-step sucrose gradient of (1.9; 1.7; 1.5; 1.3; 1.1 M sucrose in 50 mM Tris/HCl, pH 8.0, 6 mL each). 9. Load the membrane solution on top of the sucrose gradient and centrifuge at 75,000 g; 4°C for 14 h in a swing-out rotor. 10. The following day, collect the IMVs (the dark-brown ring) from the second out of five fractions starting from the top to the bottom (7–15 mL from the top of the layer) (Fig. 2B). 11. Dilute solution in Buffer A to wash the excess of sucrose and reharvest the membranes via ultracentrifugation (100,000 g; 90 min; 4°C; see step 6) 12. Resuspend membrane pellet in 8 mL Buffer B and homogenize using a Dounce homogenizer on ice. 13. At this step, membranes can be chemically treated (e.g., sodium carbonate, urea treatment), depending on the experimental conditions to remove peripherally associated proteins if desirable. This treatment results reproducibly in more membrane proteins being identified (Papanastasiou et al., 2013). 14. After chemical treatments, place membrane solution on top of a sucrose cushion solution (0.2 M sucrose; 50 mM Tris/HCl, pH 8.0; 50 mM KCl) and pellet membranes via centrifugation (100,000 g; 30 min; 4°C), in a swing-out rotor. 15. Collect membrane pellet and homogenize in 1–2 mL of Buffer B and then pass the homogenized solution through a lipid extruder, by performing 21 passes through a 100-μm filter. 16. IMVs can be stored in aliquots at 80°C until use.
2.3 MS Sample Preparation 2.3.1 Surface Proteolysis Protocol Materials • 50 mM Ammonium bicarbonate solution (ABS) (Sigma, Cat. 09830) • 1 mM Tris(2-carboxyethyl)phosphine (TCEP, Applicem, Cat. A2233) • 10 mM Iodoacetamide (IAA, Applichem, Cat. A1666) • Trypsin Gold-MS grade; Promega, Fitchburg, Wisconsin; Cat. V5280 • Trifluoroacetic acid (TFA; 99%; Cat. T6508; Sigma-Aldrich) Equipment • Beckman Coulter OptimaTM Max-XP table-top Ultracentrifuge; using TLA-100 and 7 21 mm polycarbonate tubes, Cat. 343775
Quantitative Proteomics of the E. coli Membranome
21
•
Rotational Vacuum Concentrators (Univapo 150 ECH, Montreal Biotech, Dorval, Canada), connected with an FTS Vapor Trap operating at –80°C, and an Edwards RV3 vacuum pump. Procedure 1. Thaw an aliquot of IMVs, dilute sample in 50 mM Tris/HCl, pH 8.0; 50 mM KCl; 5 mM MgCl2 and estimate the total protein content of the membranes using the bicinchoninic acid protocol (Smith et al., 1985), using BSA to prepare a 1–10 mg/mL standard curve (Thermo Scientific, Cat. 23225). 2. Aliquot 10 μg of total protein of IMVs into a new 1.5-mL Eppendorf tube and adjust volume to 100 μL using 50 ABS. 3. Reduce bisulfide bonds with 1 mM TCEP and alkylate with 10 mM IAA (30 min; 22°C; dark). 4. Digest protein solution with 0.1 μg trypsin (1/100 enzyme to protein ratio) at 37°C, overnight, shaking. 5. After digestion, pellet the membranes by ultracentrifugation (200,000 g, 4°C, 30 min) and collect the supernatant. 6. Acidify solution with 2 μL TFA (until pH < 2, test using pH paper), dry the sample (22°C, at 1250 rpm, until 2–5 μL) and proceed with the desalting step (see Section 2.3.3). 2.3.2 SDS-PAGE and In-Gel Digestion Protocol Materials • SDS sample buffer (0.35 M Tris/HCl, pH 6.8; 0.1% SDS, 30% glycerol, 5% β-mercaptoethanol; prepared as a 5 stock) • Colloidal blue Coomassie • Coomassie G250 (Sigma; Cat. B0770) • 10% Phosphoric acid (85%, w/v; Sigma; Cat. 345245) • 10% Ammonium sulfate (99%; Sigma; Cat. A4915) • 20% Methanol (99.9%; Sigma; Cat. 34885) Procedure 1. Solubilize membrane proteins in SDS sample buffer and analyze by a 10 7 cm 1D-SDS-PAGE (4% stacking gel, 12% separating gel; 29:1, w/w acrylamide/bisacrylamide). 2. Stain gels with colloidal blue Coomassie. If not extensive protein fractionation is required, proteins can be allowed to migrate for only 0.5–1 cm into the separating gel and then a single slice that contains all of them is cut from the gel. Alternatively, the polypeptides can be
22
3. 4. 5. 6. 7. 8. 9.
10. 11.
12.
13.
K.C. Tsolis and A. Economou
resolved fully and the whole lane is split into multiple slices (e.g., 10–12 slices for a 7 cm or 20 for a 15 cm separating gel). Transfer each gel slice in a 1.5-mL Eppendorf tube. Destain the gel slices by washing them 3 with 100 μL 50% (v/v) acetonitrile/water and 50 mM ABS. Add 100 μL of 50 mM solution and reduce cysteines in the presence of 10 mM DTT (45 min; 56°C). Remove solution and wash with 100 μL 50 mM ABS. Alkylate cysteines in the presence of 100 μL of 55 mM IAA in 50 mM ABS (45 min; 22°C, shaking, in the dark). Wash gel slices with 50 mM ABS. Digest proteins with 0.1 μg trypsin, overnight (sufficient for an estimated amount of protein/gel slice equal to 5–10 μg. Recommended ration of trypsin to protein is 1/50 to 1/100). After trypsin digestion, tryptic peptides are released from the gel matrix into the soluble phase. Transfer tryptic peptides into a new 1.5-mL Eppendorf tube, and wash gel slice by repeated washes with nanopure-H2O and 50% ACN in nanopure-H2O and collect washes, which contain the peptides. Quench trypsin by acidifying the sample with 1–2 μL TFA, until pH < 2. Check pH of the solution using a pH paper and acidify more if needed. Lyophilize peptides under (see Section 2.3.1) and proceed with the desalting step (see Section 2.3.3).
2.3.3 Desalting of Peptides Using C18 Tips Desalting of peptides is an essential step prior to MS analysis. It improves peptide ionization efficiency and increases the lifetime of the columns. A common off-line method of peptide desalting includes the use of columns containing a C18 matrix that binds the digested peptides (stop-and-go extraction tips—STAGE tips) (Rappsilber, Mann, & Ishihama, 2007). Ready-made tips are available commercially, but here we propose a custommade solution: Materials and buffers • 0.1% Formic acid in ultrapure H2O (18.2 MΩ) (0.1% FA) • 50% ACN, 0.1% formic acid in ultrapure H2O (50% ACN; 0.1% FA) • 80% ACN, 0.1% formic acid in ultrapure H2O (80% ACN; 0.1% FA) • C18 disks (3 M EmporeTM C18 disks; Sigma-Aldrich)
Quantitative Proteomics of the E. coli Membranome
23
Equipment • Sonication bath (Branson 2510; Branson Ultrasonics; Danbury, USA) • Table-top centrifuge (Eppendorf; 5430) Note In every centrifugation step you should avoid drying of the C18 matrix, as this results in a reduced capacity of the column to bind peptides. As not all the columns will be synchronized, during each centrifugation step you can selectively remove the filters in which the solution has passed through, until the completion of the step. Procedure 1. To prepare the C18 tips (columns), cut the tip of a common 200 μL tip (1 mm diameter), or use a blunt tip pipetting needle of similar inner diameter, to extract small pieces of C18 matrix from the C18 disk. 2. Prepare C18 tips, by adding two pieces of C18 matrix one on top of the other, in a common 200 μL tip. The binding capacity of the C18 matrix is estimated to 3–4 μg of peptides per C18 tip. 3. Rehydrate the C18 tip by washing it 3 with 60 μL 80% ACN, 0.1% FA, and centrifuge at 600 g, 1–2 min, on a bench-top centrifuge at RT (22°C). 4. Equilibrate C18 column by washing it 3 with 60 μL 0.1% FA and centrifuge at 600 g, 1–2 min. 5. Resuspend peptides in 100 μL 0.1% FA, vortex, and sonicate in a sonication bath (22°C, 2 min constant). 6. Load the peptides on the columns and use a new 1.5-mL Eppendorf tubes to collect the peptides. Centrifuge at 600–800 g for 1–2 min. 7. Repeat the loading step at least two more times to increase the amount of peptides that will bind into the C18 matrix of the tip. 8. Wash the columns 3 with 60 μL 0.1% FA and centrifuge at 600–800 g for 1–2 min. 9. Use a clean collection tube to store the peptides. 10. Elute bound peptides with 60 μL 50% ACN, 0.1% ACN. Repeat elution step two more times. 11. Lyophilize peptides (Speedvac; Savant) and store at 80°C until use. 2.3.4 Additional Proteomics Sample Preparation Methods Additionally, depending on the goal of the analysis, membrane proteins in the original IMV sample can be solubilized using detergents and either be further fractionated or not (Fig. 2). In cases where MS-incompatible detergents are used, protein digestion can be performed either in-solution using
24
K.C. Tsolis and A. Economou
the filter-aided sample preparation protocol (Wisniewski et al., 2009) or after 1- or 2D SDS-PAGE and in-gel digestion. Alternatively, if MS-compatible detergents are used (e.g., DDM), proteins can be digested in-solution, and digested peptides can be further fractionated using OFFGEL electrophoresis, which separates peptides in a solution based on their isoelectric point, increasing the number of identified peptides and consequently protein coverage (Horth, Miller, Preckel, & Wenz, 2006).
3. PEPTIDE/PROTEIN IDENTIFICATION 3.1 Introduction Given the range of available MS instruments, software platforms, and goals of each proteomic experiment, different approaches can be used for the protein identification or quantification. Parameters affecting the selection of the analysis workflow include the aim of the experimental setup, the advantages of each analysis method, the available instrumentation and software, and the prior experience of each laboratory on specific analysis pipelines. In this session, we present workflows that have been followed in our laboratory for membrane proteomics studies.
3.2 MS Analysis Lyophilized peptide samples are dissolved in an aqueous solution containing 0.1% FA; 5% ACN in ultrapure water (18.2 MΩ) and analyzed using nanoReverse-Phase LC coupled to a Q Exactive™ Hybrid Quadrupole— Orbitrap mass spectrometer (Thermo Scientific, Bremen, Germany) through the EASY-spray nanoelectrospray ion source (Thermo Scientific, Bremen, Germany). Reverse phase LC was performed on an EasySpray C18 column (Thermo Scientific, OD 360 μm, ID 50 μm, 15 cm length, C18 resin, 2 μm bead size) using a Dionex UltiMate 3000 UHPLC system at a flow rate of 300 nL/min. The LC mobile phase consisted of two different buffer solutions, an aqueous solution containing 0.1% v/v FA (Buffer A) and an aqueous solution containing 0.08% (v/v) FA and 80% (v/v) ACN (Buffer B). A 60-min multistep gradient was used from Buffer A to Buffer B (percentages from each in parentheses later) as follows: 0–3 min constant (96:4), 3–15 min (90:10), 15–35 min (65:35), 35–40 min (35:65), 40–41 min (5:95), 41–50 min (5:95), 50–51 min (95:5), and 51–60 min (95:5). These conditions were optimized of E. coli surface proteolysis protocol on IMVs. Depending on the sample complexity, several parameters can be further adjusted.
Quantitative Proteomics of the E. coli Membranome
25
Peptides were analyzed in the Orbitrap QE operated in positive ion mode (nanospray voltage 1.5 kV, source temperature 250°C). The instrument was operated in data-dependent acquisition mode with a survey MS scan at a resolution of 70,000 FWHM for the mass range of m/z 400–1600 for precursor ions, followed by MS/MS scans of the top 10 software-selected most intense peaks with +2, +3, and +4 charged ions above a threshold ion count of 16,000 at 35,000 FWHM resolution, using higher-energy collisional dissociation method (HCD). MS/MS was performed using normalized collision energy (NCE) of 25% with an isolation window of 3.0 m/z, an apex trigger 5–15 s, and a dynamic exclusion of 10 s. Data were acquired with Xcalibur 2.2 software (Thermo Scientific).
3.3 Peptide/Protein Identification 3.3.1 Peptide Identification—Approach 1 Raw MS files were processed using Proteome Discoverer v1.1 (Thermo Scientific) using both Mascot 2.3 (Matrix Science) and SEQUEST (Thermo Scientific) search algorithms against the E. coli BL21 theoretical proteome as it was published in Uniprot (tax.id: 511693; December 2010; 4156 entries) and common contaminants (e.g., keratins). This proteome was manually curated for protein topology, data are available through the STEPdb (http://www.stepdb.eu) (Orfanoudaki & Economou, 2014), and was used for the data analysis. Enzyme specificity was set to trypsin, with a minimum of two missed cleavages allowed. Dynamic (methionine oxidation and N-terminal acetylation) and fixed (S-carbamidomethylation of cysteinyl residues) modifications were included. Precursor ion mass error was set to 10 ppm and derivative fragment mass error to 0.02 Da. The search engine output datafiles in “.mgf” and “.msf” format were imported into Scaffold v.4.4 (Proteome software; Portland, USA). Scores from both Mascot and SEQUEST algorithms were combined through the PeptideProphet and ProteinProphet algorithms (Keller, Nesvizhskii, Kolker, & Aebersold, 2002; Nesvizhskii, Keller, Kolker, & Aebersold, 2003; Searle, Turner, & Nesvizhskii, 2008) in Scaffold. We routinely set thresholds for protein and peptide identification through the ProteinProphet and PeptideProphet algorithms to 99% and 95%, respectively, and use a minimum of two peptides identified per protein. For proteins identified with these criteria the false discovery rate (FDR) is <0.1%. Depending on the aim of the study, less strict criteria can be used, by using protein FDR <1%
26
K.C. Tsolis and A. Economou
as a threshold for discriminating the positive and negative hits, and accepting a minimum number of peptides equal to 1. 3.3.2 Peptide Identification—Approach 2 Raw MS files were imported into the MaxQuant v1.5.3.30 software package (Cox & Mann, 2008). MS/MS spectra were search using the Andromeda search engine, against E. coli proteome of interest sequence and common contaminants, and then filtered for min Andromeda score of 75 in order to exclude low confidence identifications. Enzyme specificity was set to trypsin, allowing for a maximum of two missed cleavages. Dynamic (methionine oxidation and N-terminal acetylation) and fixed (S-carbamidomethylation of cysteinyl residues) modifications were selected. Precursor and MS/MS mass tolerance was set to 10 ppm for the first search (for the identification of maximum number of peptides for mass and retention time calibration) and 4.5 ppm for the main search (for the refinement of the identifications). Peptide features were aligned with a match time window of 0.7 min. The peptides report file was further processed for protein identification and quantification.
4. QUANTIFICATION 4.1 Introduction In this session, we present quantification methods that have been tested previously in the lab. For more detailed information, please follow the references included in the text. Quantitative proteomics approaches generally include labeling and labelfree methods (Bantscheff, Lemeer, Savitski, & Kuster, 2012). Labeling methods can reduce the variation between samples, since the samples are pooled and are analyzed together. In addition, labeling simplifies the comparison across the samples, in cases where a peptide fractionation or enrichment step is included, during sample preparation (e.g., enrichment of modified peptides) (Ong & Mann, 2007). Labeling methods are usually expensive. On the contrary, label-free methods are simpler, inexpensive, thus, allowing their use in large datasets, and can generate statistically robust results if multiple repeats are performed. Label-free methods can be further divided into spectral-based methods and intensity-based methods. Common spectral-based methods include spectral counting that takes into account the number of MS/MS scans for each peptide of a protein (Liu, Sadygov, & Yates, 2004), emPAI that uses
Quantitative Proteomics of the E. coli Membranome
27
the number of observed peptides over the number of observable peptides (Ishihama et al., 2005), normalized spectral abundance factor (NSAF) that uses the number of spectral counts and the protein length (Paoletti et al., 2006) and spectral index (SI) that includes information of peptide count, spectral count, and fragment-ion intensity (Griffin et al., 2010), and modifications of them. Spectral count methods are correlated with protein abundance; however, the resolution of quantification decreases for the low-abundance proteins (Old et al., 2005). For intensity-based quantification, depending on the aim of the analysis, different approaches can be used such as comparison of the intensities between common peptides across the samples, sum of peptide intensities, or modifications of these methods (Wilhelm et al., 2014). Comparison of common peptides can be reliable if the candidate peptides are carefully chosen. Variation of peptide abundance due to missed cleavages, modifications, or the physicochemical properties of the peptides can influence the quantification process. For this, principles similar to the ones applied in targeted proteomics workflows and prior experience should be used for the selection of candidate peptides (Picotti & Aebersold, 2012). Simpler approaches such as sum of peptide intensities can correct for the peptide variation, and can be easily applied in high-throughput datasets (Bantscheff et al., 2012). In addition, more sophisticated algorithms that take into account the peptide intensities and the number of MS-detectable peptides (iBAQ) (Schwanhausser et al., 2011) or the pairwise difference of peptide intensities across samples (MaxLFQ) (Cox et al., 2014) can be used through the analogous scripts or software platforms. Typical steps of the analysis workflow include the initial normalization of the samples, to correct the technical variation and systemic bias, and consequently for the comparison of peptide intensities across the samples. Common normalization approaches include linear regression, local regression, quartile normalization, and central tendency normalization (or global normalization) around the mean or median or a fixed value (Callister et al., 2006; Karpievitch, Dabney, & Smith, 2012). Peptide intensities are initially log transformed in order to follow a normal distribution, and then the desired normalization method is applied. Selection of the appropriate normalization method depends on the source of variation (Callister et al., 2006; Karpievitch et al., 2012). For the identification of differentially abundant proteins, depending on the biological question, univariate (e.g., t-test, analysis of variance, Mann–Whitney, Kruskal Wallis) or multivariate [e.g., partial least squares (PLS) regression, principal component analysis, clustering] analysis
28
K.C. Tsolis and A. Economou
workflows and their combinations can be followed (Carpentier, 2016; Jung, 2011). Univariate methods consider each variable as independent and apply a statistical test for significance in each of the proteins. For this, p-values should be adjusted afterward to correct for multiple hypothesis testing errors (Noble, 2009). Common methods in the proteomics field for adjusting p-values are the Bonferroni and the Benjamini–Hochberg correction (Benjamini & Hochberg, 1995), with the first been more stringent than the second (Diz, Carvajal-Rodriguez, & Skibinski, 2011). On the other hand, multivariate analysis approaches uses information from the complete dataset instead of individual proteins, and can capture correlations between groups. However, care should be exercised to cross-validate the model to avoid overfitting to the data (Rosenberg, Franzen, Auer, Lehti€ o, & Forshed, 2010).
4.2 Spectral-Based Quantification of Membrane Proteins Spectral counts are correlated with the actual protein abundance (Old et al., 2005). Previously, we tested two of these approaches (NSAF and emPAI) for the quantification of membrane proteins and proposed additional modifications (Papanastasiou et al., 2016). NSAF uses the number of spectral counts divided by the protein’s length, divided by the sum of spectral counts/protein lengths of the dataset (Paoletti et al., 2006). The emPAI algorithm uses the exponential ratio of the number of identified peptides divided by the number of observable peptides (Ishihama et al., 2005). Use of both algorithms for the quantification of soluble proteins results in good correlation between actual protein abundance and quantitative values. However, in a surface proteolysis experiment, it is expected that only the soluble segments of the integral membrane proteins can be detected, since the transmembrane domains remain integrated in the lipid layer. For the correction of this experimental bias, we previously proposed the adjustment of NSAF and emPAI algorithms for membrane proteins by refining the protein length or the number of MS-detectable peptides after removing the transmembrane domain of the membrane proteins (NSAFSP and emPAISP). Application of this rationale gave good correlation of the protein abundance for membrane proteins (Papanastasiou et al., 2016).
4.3 Intensity-Based Label-Free Quantification of Membrane Proteins 4.3.1 Peptide Intensity Normalization In the present example of a dataset derived from surface proteolysis, we are comparing two conditions, untreated IMVs and urea-treated IMVs (labeled
Quantitative Proteomics of the E. coli Membranome
29
as IMVs and urea-IMVs) by correlating the summed peptide intensities of each protein between samples. Since urea treatment of the membranes will remove a great proportion of nonspecific cytoplasmic contaminants and peripheral inner membrane proteins, we can normalize our dataset using only the peptide intensities of the membrane proteins, by applying central tendency normalization around a mean value (Callister et al., 2006). The rationale behind this normalization method is that technical variation (e.g., loading amount of the sample) shifts the peptide intensities between different runs by a constant factor that can be calculated. For dataset normalization, initially we assign a topology annotation for each peptide, using the STEPdb database (http://www.stepdb.eu) (Orfanoudaki & Economou, 2014). Peptide intensities of the membrane proteins are log transformed and plotted to visualize the distribution of the data (Fig. 3A—left panel). We define one sample as reference and we compare each sample (test sample) with the reference. A normalization factor for each run is calculated by subtracting the log transformed intensity of each peptide of the test sample with the reference one and then measuring the average difference. These calculations can be easily performed even in a simple spreadsheet (e.g., Microsoft Excel), without the need of any programming knowledge. This normalization factor is used to correct all the peptide intensities of the test samples (Fig. 3A—right panel). Next, we calculate the normalized sum of the peptides intensity for all the identified proteins from the log transformed intensity. The within-group variation of protein quantification, for proteins identified among all the biological and technical repeats of an experiment, can be tested by calculating the coefficient of variation (CV). In an experiment of this type the majority of proteins, that correspond to all the subcellular compartments, show a CV value <30%, suggesting high reproducibility between the repeats (Fig. 3B). After normalization of the total peptide intensities based on the intensities from peptides of the membrane proteins, distribution of the CV for protein intensity (sum of peptide intensities), displays a shift to the left, implying an improvement in reproducibility of peptide/protein quantification. 4.3.2 Quantification and Statistical Analysis 4.3.2.1 Univariate Analysis
A common approach to identify differentially abundant proteins between two sample categories is to test for statistical significance in protein abundance and also compare the fold difference of protein intensity between groups. A two-tail t-test can be applied, assuming normality of the intensity
30
K.C. Tsolis and A. Economou
Fig. 3 Normalization, statistical analysis, and identification of differentially abundant proteins. We compare two conditions of IMVs, untreated IMVs (IMVs), and urea-treated IMVs (Urea IMVs), in which condition a great portion of membrane-associated proteins is removed. Five technical repeats for each condition were tested. (A) Peptides belonging to membrane proteins can be used for the normalization of peptide intensities across the samples, by applying central tendency normalization on mean value.
Quantitative Proteomics of the E. coli Membranome
31
distribution for each protein, and the fold difference is calculated by dividing the average quantitative values of the two treatments. Common thresholds of significance include a p-value <0.05 and a fold difference >2. In addition, p-values should be adjusted to correct for multiple hypothesis testing error. In the dataset presented here, p-values were corrected using the p.adjust function in R language (R Core Team (2016), Vienna, Austria). Differentially abundant proteins are plotted in a scatter plot (volcano plot) using R, after log transformation (Fig. 3C).
4.3.2.2 Multivariate Analysis
Sum of peptide intensities, for each protein, were normalized to the maximum value across the samples, so that all the different proteins have the same maximum intensity. Partial least squares discriminant analysis (PLS-DA) was performed in R language using the mdatools package. After building the PLS model with our dataset, the percentage of variation in our dataset explained by the first five components is plotted (Fig. 3D). The first components explain most of the variation whereas the rest of the components have a minor contribution (Fig. 3D/E). This was expected since we are comparing technical repeats of the same initial sample. For the next steps, we will use the first two components that explain 90% of the total variation. (B) The distribution of CV for sum of peptide intensities for each protein before and after normalization. Reproducibility of protein quantification is improved after the normalization process. (C) Identification of differentially abundant proteins between the two treatments, using univariate methods (t-test and fold change). Each circle represents one protein. Cytoplasmic or peripherally associated proteins show higher abundance in the untreated condition, as expected. After urea treatment, membrane proteins are more accessible to trypsin, and a small number of peripherally associated proteins remain on the membranes. (D) The same dataset was analyzed using PLS-DA method. The first component explains most of the variance of the model, and along with the second approximately 90% of the variance is explained. (E) Score plot for the first two components. Each circle represents one technical repeat. Variance explained by each component is noted in parenthesis. The technical repeats for the two conditions are clearly separated on the x-axis that explains most of the variation of the model. (F) Loadings plot for the first two components. Each circle represents one protein. Membrane proteins are localized in the center of the plot, suggesting small variation across the samples. Cytoplasmic- and membrane-associated proteins are shifted from the center of the plot, showing differential abundance in the dataset and are positioned to the left side (where the IMVs condition is localized based on (E)), suggesting higher abundance in the IMVs condition. (G) The VIP method was used for the selection of proteins showing differentially abundance (variable selection). Differentially abundant proteins were selected based on their VIP score, setting the threshold equal to 1.
32
K.C. Tsolis and A. Economou
The “scores plot” of the PLS model shows that the different samples of the two groups are separated on the x-axis that explains most of the variation (Fig. 3E). Variation of protein abundance across the samples is presented in the “loadings plot” of the PLS model (Fig. 3F). Membrane proteins show similar abundance across the samples and these values are positioned in the center of the plot. This distribution was expected since the dataset is normalized on the peptides of the membrane proteins. On the contrary, we observe that cytoplasmic proteins or peripherally associated proteins, which are also present on the urea-untreated IMVs during purification, are shifted from the center of the plot toward the left side, suggesting higher abundance in the urea-untreated condition compared to the urea-treated IMVs. Removal of peripheral proteins by urea exposes more peptides of membrane proteins for tryptic cleavage and identification (Papanastasiou et al., 2016). These proteins are now overrepresented in the urea-treated condition. This observation suggests that physiological protein–protein interactions are retained during the sample preparation process, allowing the use of this method for the study of dynamic phenomena (e.g., protein binding on a membrane receptor). For the variable selection, we used the VIP method (variable importance in projection), which ranks proteins based on their contribution to the total variation, selecting the variables with VIP score >1 (Fig. 3G) (Chong & Jun, 2005). Additional univariate analysis methods can be combined to further validate the selected candidate proteins (Tsolis et al., 2016). 90% of the proteins selected using the multivariate approach were also selected from the univariate method, showing that the two approaches are in good agreement and represent valid alternatives. Depending on the aim of the analysis, additional statistical/bioinformatics methods can be applied on the differentially abundant proteins.
5. FUNCTIONAL ANNOTATION After identifying the proteins of interest with confidence, the next step is to search for biological functions. A common approach to start the analysis is to use the gene ontology (http://www.geneontology.org) (Ashburner et al., 2000) or cluster of orthologous groups databases (Tatusov, Koonin, & Lipman, 1997), to identify enriched biological processes, functions, or topologies. Filtering the dataset into a specific topological category (e.g., membrane-associated proteins) aids a more precise annotation. Our experience is that even for premier biological models, such as E. coli, the available information in public protein or functional databases is incomplete
Quantitative Proteomics of the E. coli Membranome
33
or oftentimes wrong. Thus, use of these tools needs to be done with some caution. For E. coli K-12, BL21/DE3, and EPEC (O127:H6), the complete topological annotation of all the proteome has been manually curated using in vivo and in vitro experiments, bioinformatics, export systems, protein– protein interactions, and other data (Orfanoudaki & Economou, 2014, unpublished; Papanastasiou et al., 2013) and is available through the STEPdb database (http://www.stepdb.eu) (Orfanoudaki & Economou, 2014). Depending on the goals of the analysis, pathway databases containing also manually curated data and additional tools can be accessed either through KEGG (Kanehisa, Sato, Kawashima, Furumichi, & Tanabe, 2016), Wikipathways (Kutmon & Riutta, 2016), or BioCyc (Caspi et al., 2016) databases. In addition, several of these tools can be accessed through integrated platforms for functional annotation like WebGestalt (Zhang, Kirov, & Snoddy, 2005) or DAVID (Huang da, Sherman, & Lempicki, 2009).
6. CONCLUSION In membrane proteomics workflows, technical limitations including isolation and stability of transmembrane proteins or compatibility of detergents with LC–MS/MS instruments, require adaptations of the sample preparation protocols. Similar to every experimental setup, sample purity is a main parameter that will affect the quality of the data. For this, detailed membrane sample preparation is essential for membrane characterization and identification of low-abundant membrane proteins. Surface proteolysis of IMVs can provide a fast, simple, and reproducible method, for characterization of the inner membrane proteome. It takes a low toll on MS-time and this allows for multiple repeats to statistically strengthen the data. In addition, both label and label-free quantification methods can be applied for the comparative analysis of membranes. However, depending on the aim of the study, a membrane proteomics workflow should be tailor-optimized for the specific sample type and protein(s) of interest.
ACKNOWLEDGMENTS We are grateful to S. Carpentier for useful discussions and advice. Research in our lab was funded by Grants: KUL-Spa (Onderzoekstoelagen 2013, Bijzonder Onderzoeksfonds, KU Leuven); (Vlaanderen Onderzoeksprojecten, FWO: RiMembR #G0C6814N, T3RecS #G002516N, and #G0B4915N); StrepSynth (FP7 KBBE.2013.3.6-02: Synthetic Biology toward applications; #613877, EU); DIP-BiD (#AKUL/15/40—G0H2116N, Hercules/ FWO); and IOF (Industrieel Onderzoeksfonds KU Leuven, 3M140254/ZKC8143/KP/ 14/008).
34
K.C. Tsolis and A. Economou
REFERENCES Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., … Sherlock, G. (2000). Gene ontology: Tool for the unification of biology. The gene ontology consortium. Nature Genetics, 25(1), 25–29. http://dx.doi.org/10.1038/75556. Bantscheff, M., Lemeer, S., Savitski, M. M., & Kuster, B. (2012). Quantitative mass spectrometry in proteomics: Critical review update from 2007 to the present. Analytical and Bioanalytical Chemistry, 404(4), 939–965. http://dx.doi.org/10.1007/s00216-0126203-4. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological), 57(1), 289–300. Callister, S. J., Barry, R. C., Adkins, J. N., Johnson, E. T., Qian, W. J., WebbRobertson, B. J., … Lipton, M. S. (2006). Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. Journal of Proteome Research, 5(2), 277–286. http://dx.doi.org/10.1021/pr050300l. Carpentier, S. C. (2016). Multiple testing and pattern recognition in 2-DE proteomics. Methods in Molecular Biology, 1384, 215–235. http://dx.doi.org/10.1007/978-1-49393255-9_13. Caspi, R., Billington, R., Ferrer, L., Foerster, H., Fulcher, C. A., Keseler, I. M., … Karp, P. D. (2016). The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Research, 44(D1), D471–D480. http://dx.doi.org/10.1093/nar/gkv1164. Chong, I.-G., & Jun, C.-H. (2005). Performance of some variable selection methods when multicollinearity is present. Chemometrics and Intelligent Laboratory Systems, 78(1–2), 103–112. http://dx.doi.org/10.1016/j.chemolab.2004.12.011. Cox, J., Hein, M. Y., Luber, C. A., Paron, I., Nagaraj, N., & Mann, M. (2014). Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Molecular & Cellular Proteomics, 13(9), 2513–2526. http://dx.doi.org/10.1074/mcp.M113.031591. Cox, J., & Mann, M. (2008). MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nature Biotechnology, 26(12), 1367–1372. http://dx.doi.org/10.1038/nbt.1511. Diz, A. P., Carvajal-Rodriguez, A., & Skibinski, D. O. (2011). Multiple hypothesis testing in proteomics: A strategy for experimental work. Molecular & Cellular Proteomics, 10(3). http://dx.doi.org/10.1074/mcp.M110.004374. M110.004374. Futai, M. (1974). Orientation of membrane vesicles from Escherichia coli prepared by different procedures. The Journal of Membrane Biology, 15(1), 15–28. Griffin, N. M., Yu, J., Long, F., Oh, P., Shore, S., Li, Y., … Schnitzer, J. E. (2010). Labelfree, normalized quantification of complex mass spectrometry data for proteomic analysis. Nature Biotechnology, 28(1), 83–89. http://dx.doi.org/10.1038/nbt.1592. Horth, P., Miller, C. A., Preckel, T., & Wenz, C. (2006). Efficient fractionation and improved protein identification by peptide OFFGEL electrophoresis. Molecular & Cellular Proteomics, 5(10), 1968–1974. http://dx.doi.org/10.1074/mcp.T600037-MCP200. Huang da, W., Sherman, B. T., & Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols, 4(1), 44–57. http://dx.doi.org/10.1038/nprot.2008.211. Ishihama, Y., Oda, Y., Tabata, T., Sato, T., Nagasu, T., Rappsilber, J., & Mann, M. (2005). Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Molecular & Cellular Proteomics, 4(9), 1265–1272. http://dx.doi.org/10.1074/mcp. M500061-MCP200.
Quantitative Proteomics of the E. coli Membranome
35
Jung, K. (2011). Statistics in experimental design, preprocessing, and analysis of proteomics data. Methods in Molecular Biology, 696, 259–272. http://dx.doi.org/10.1007/978-160761-987-1_16. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., & Tanabe, M. (2016). KEGG as a reference resource for gene and protein annotation. Nucleic Acids Research, 44(D1), D457–D462. http://dx.doi.org/10.1093/nar/gkv1070. Karpievitch, Y. V., Dabney, A. R., & Smith, R. D. (2012). Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics, 13(16), 1–9. http://dx. doi.org/10.1186/1471-2105-13-s16-s5. Keller, A., Nesvizhskii, A. I., Kolker, E., & Aebersold, R. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry, 74(20), 5383–5392. Kutmon, M., & Riutta, A. (2016). WikiPathways: Capturing the full diversity of pathway knowledge. Nucleic Acids Research, 44(D1), D488–D494. http://dx.doi.org/10.1093/ nar/gkv1024. Liu, H., Sadygov, R. G., & Yates, J. R., 3rd. (2004). A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Analytical Chemistry, 76(14), 4193–4201. http://dx.doi.org/10.1021/ac0498563. Nesvizhskii, A. I., Keller, A., Kolker, E., & Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Analytical Chemistry, 75(17), 4646–4658. Noble, W. S. (2009). How does multiple testing correction work? Nature Biotechnology, 27(12), 1135–1137. Old, W. M., Meyer-Arendt, K., Aveline-Wolf, L., Pierce, K. G., Mendoza, A., Sevinsky, J. R., … Ahn, N. G. (2005). Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Molecular & Cellular Proteomics, 4(10), 1487–1502. http://dx.doi.org/10.1074/mcp.M500084-MCP200. Ong, S.-E., & Mann, M. (2007). A practical recipe for stable isotope labeling by amino acids in cell culture (SILAC). Nature Protocols, 1(6), 2650–2660. Orfanoudaki, G., & Economou, A. (2014). Proteome-wide subcellular topologies of E. coli polypeptides database (STEPdb). Molecular & Cellular Proteomics, 13(12), 3674–3687. http://dx.doi.org/10.1074/mcp.O114.041137. Paoletti, A. C., Parmely, T. J., Tomomori-Sato, C., Sato, S., Zhu, D., Conaway, R. C., … Washburn, M. P. (2006). Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors. Proceedings of the National Academy of Sciences of the United States of America, 103(50), 18928–18933. http://dx. doi.org/10.1073/pnas.0606379103. Papanastasiou, M., Orfanoudaki, G., Koukaki, M., Kountourakis, N., Sardis, M. F., Aivaliotis, M., … Economou, A. (2013). The Escherichia coli peripheral inner membrane proteome. Molecular & Cellular Proteomics, 12(3), 599–610. http://dx.doi.org/ 10.1074/mcp.M112.024711. Papanastasiou, M., Orfanoudaki, G., Kountourakis, N., Koukaki, M., Sardis, M. F., Aivaliotis, M., … Economou, A. (2016). Rapid label-free quantitative analysis of the E. coli BL21(DE3) inner membrane proteome. Proteomics, 16(1), 85–97. http://dx. doi.org/10.1002/pmic.201500304. Picotti, P., & Aebersold, R. (2012). Selected reaction monitoring-based proteomics: Workflows, potential, pitfalls and future directions. Nature Methods, 9(6), 555–566. http://dx.doi.org/10.1038/nmeth.2015. Pieper, R., Huang, S. T., Clark, D. J., Robinson, J. M., Alami, H., Parmar, P. P., … Peterson, S. N. (2009). Integral and peripheral association of proteins and protein complexes with Yersinia pestis inner and outer membranes. Proteome Science, 7, 5. http://dx. doi.org/10.1186/1477-5956-7-5.
36
K.C. Tsolis and A. Economou
Rabilloud, T. (2009). Membrane proteins and proteomics: Love is possible, but so difficult. Electrophoresis, 30(Suppl. 1), S174–S180. http://dx.doi.org/10.1002/elps.200900050. Rappsilber, J., Mann, M., & Ishihama, Y. (2007). Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nature Protocols, 2(8), 1896–1906. http://dx.doi.org/10.1038/nprot.2007.261. Rosenberg, L. H., Franzen, B., Auer, G., Lehti€ o, J., & Forshed, J. (2010). Multivariate metaanalysis of proteomics data from human prostate and colon tumours. BMC Bioinformatics, 11(1), 1–12. http://dx.doi.org/10.1186/1471-2105-11-468. Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., … Selbach, M. (2011). Global quantification of mammalian gene expression control. Nature, 473(7347), 337–342. http://dx.doi.org/10.1038/nature10098. Searle, B. C., Turner, M., & Nesvizhskii, A. I. (2008). Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. Journal of Proteome Research, 7(1), 245–253. http://dx.doi.org/10.1021/pr070540w. Smith, P. K., Krohn, R. I., Hermanson, G. T., Mallia, A. K., Gartner, F. H., Provenzano, M. D., … Klenk, D. C. (1985). Measurement of protein using bicinchoninic acid. Analytical Biochemistry, 150(1), 76–85. Solis, N., & Cordwell, S. J. (2011). Current methodologies for proteomics of bacterial surface-exposed and cell envelope proteins. Proteomics, 11(15), 3169–3189. http://dx. doi.org/10.1002/pmic.201000808. Speers, A. E., & Wu, C. C. (2007). Proteomics of integral membrane proteins—Theory and application. Chemical Reviews, 107(8), 3687–3714. http://dx.doi.org/10.1021/ cr068286z. Tatusov, R. L., Koonin, E. V., & Lipman, D. J. (1997). A genomic perspective on protein families. Science, 278(5338), 631–637. Tsolis, K. C., Bagli, E., Kanaki, K., Zografou, S., Carpentier, S., Bei, E. S., … Economou, A. (2016). Proteome changes during transition from human embryonic to vascular progenitor cells. Journal of Proteome Research, 15(6), 1995–2007. http://dx.doi.org/10.1021/acs. jproteome.6b00180. Wallin, E., & von Heijne, G. (1998). Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Science, 7(4), 1029–1038. http://dx.doi.org/10.1002/pro.5560070420. Wilhelm, M., Schlegl, J., Hahne, H., Gholami, A. M., Lieberenz, M., Savitski, M. M., … Kuster, B. (2014). Mass-spectrometry-based draft of the human proteome. Nature, 509(7502), 582–587. http://dx.doi.org/10.1038/nature13319. Wisniewski, J. R., Zougman, A., & Mann, M. (2009). Combination of FASP and StageTipbased fractionation allows in-depth analysis of the hippocampal membrane proteome. Journal of Proteome Research, 8(12), 5674–5678. http://dx.doi.org/10.1021/pr900748n. Wu, C. C., MacCoss, M. J., Howell, K. E., & Yates, J. R., 3rd. (2003). A method for the comprehensive proteomic analysis of membrane proteins. Nature Biotechnology, 21(5), 532–538. http://dx.doi.org/10.1038/nbt819. Zhang, X. (2015). Less is more: Membrane protein digestion beyond urea-trypsin solution for next-level proteomics. Molecular & Cellular Proteomics, 14(9), 2441–2453. http://dx. doi.org/10.1074/mcp.R114.042572. Zhang, B., Kirov, S., & Snoddy, J. (2005). WebGestalt: An integrated system for exploring gene sets in various biological contexts. Nucleic Acids Research, 33(Web Server issue), W741–W748. http://dx.doi.org/10.1093/nar/gki475.