C H A P T E R
9 Proteomic Analyses Fernando J. Corrales1, Leticia Odriozola2 1
Functional Proteomics Laboratory, Centro Nacional de Biotecnologı´a, CSIC, CIBERhed-ISCIII, PRB2, ProteoRedISCIII, Madrid, Spain; 2CIMA, University of Navarra, Pamplona, Spain
Glossary Terms
the scientific community to develop proteomics in an attempt to characterize the proteinaceous component or proteome of a biological system. Proteomics has experienced a huge expansion and has attracted the interest of many scientists because proteins are the drivers of cellular pathways in addition to regulating them. Besides, the human proteome configures the universe for the discovery of novel biomarkers and nutritional/therapeutic targets that are expected for population stratification and to promote the development of individualized interventions tailored for the specific needs of particular individuals. The term “proteome” was initially coined by Wilkins in 1995, although the overall concept of proteomics was introduced in modern biology in 1982 by Anderson, who proposed a human protein index based on two-dimensional electrophoresis (2DE). Accordingly, proteomics can be defined as a discipline focused on the study of the proteome using technologies that allow for large-scale analysis. The proteome can be defined as the set of proteins expressed by a genome, but this concept requires additional considerations if the proteome must be regarded in its whole dimension. The proteome is highly dynamic; it must be defined within genetic and environmental constraints that will determine the repertoire of protein species. The plasticity of the proteome is essential to finely tune cellular functions by control mechanisms in addition to those that are merely transcriptional, enabling fast and efficient reactions to external stimuli orchestrated in adaptive responses to ensure cell survival. It would be easy to define the proteome as a simple translation of the 20,000 human genes, but the different mechanisms of posttranscriptional regulation makes it necessary to revise the rule of one gene, one protein. RNA transcript splicing, protein processing, and posttranslational modifications (PTMs) (more
Biomarker Biomolecule indicating a particular biological condition or process. Electrophoresis Separation of biomolecules by their different behaviors in an electric field. Liquid chromatography Separation of biomolecules by their different degrees of interaction with a given sorbent. Mass spectrometry Analytical technique that arranges ions based on their mass to charge ratio. Proteoform Different protein species encoded by a given gene. Proteome Complete set of protein species of a cell or tissue. Proteomics Science that studies the proteome.
INTRODUCTION The massive amount of information generated from the many genome-wide sequencing projects has greatly improved our interpretation of human biology in health and disease. However, the sequence of the 3,120,000,000 base pairs integrating the human genome cannot in itself explain the biological complexity of the human body. The roughly 20,000 human protein coding genes and the yet uncounted nonprotein coding structures are common to more than 200 differentiated cell types that specialize in performing specific functions. The genome must be considered the first information layer of a sophisticated network that configures the structural and functional diversity and is not enough to provide a full understanding of phenotypes, the dynamic and regulatory mechanisms orchestrating adaptive responses and pathogenic processes. Thus, it appears evident that in addition to specific gene expression programs and splicing events, a definition of protein profiles as well as their regulation is essential to translate genes into biological functions. Realizing that proteins are the functional cellular effectors, and the availability of unprecedented technological resources have moved
Principles of Nutrigenetics and Nutrigenomics https://doi.org/10.1016/B978-0-12-804572-5.00009-4
69
Copyright © 2020 Elsevier Inc. All rights reserved.
70
9. PROTEOMIC ANALYSES
ORGANISM Experimental design, definition of groups
Sample collection and storage Normalized strategies to ensure analytical reproducibility
Metabolic labeling
Proteome fractionation
Protein extraction and solubilization
SILAC
Optional strategy to decrease proteome complexity
Protein/peptide separation and identification Protein digestion
Protein separation 1D electrophoresis
2D electrophoresis, DIGE
Peptide Isotopic labeling ICAT, iTRAQ, H218O
Differential analysis Peptide separation and comparative analysis
Protein digestion
Multidimensional liquid chromatography
MS and MS/MS analysis Confirmation of differential proteins Target confirmation with alternative methods (Western blot, ELISA, SRM, etc.)
Validation of biomarkers Population screening to assess sensitivity and specificity
Development of detection devices with clinical application Quantitative detection of biomarkers based on easy, reliable and reproducible methods
FIGURE 9.1 Schematic representation of the different steps of a proteomics pipeline. 1D, one-dimensional; DIGE, difference gel electro-
phoresis; ELISA, enzyme-linked immunosorbent assay; ICAT, isotope-coded affinity tag; iTRAQ, isobaric multiplexing tagging system; MS, mass spectrometry; SILAC, stable isotopic labelling with amino acids in cell culture; SRM, selected reaction monitoring.
than 300 different types of PTMs have been described) generate the necessary diversity of gene products to define phenotypes and increase the complexity of the proteome landscape up to at least hundreds of thousands of proteoforms. The huge dimension of the proteome, the physicochemical heterogeneity of proteins, and the wide dynamic range across protein species in biological matrices (up to 10 logs) make it necessary to elaborate sophisticated pipelines integrating different analytical strategies to study the proteome or specific subproteomes in depth. A typical proteomics experiment can be structured in four different steps: sample collection, handling, and storage; protein extract preparation; protein/peptide separation; and protein/peptide identification, characterization, and quantitation. The sample type as well as the aims of the study will determine the optimal combination of analytical methods (Fig. 9.1). One critical issue for a successful proteomic experiment is the design of sample collection and storage procedures that ensure the maintenance of the proteome integrity and minimize sample heterogeneity. In this regard, interaction with biobanks (BBMRI-ERIC)
(http://www.bbmri-eric.eu) is of utmost importance to obtain access to reliable sample collections generated under standardized protocols. Second, proteins must be solubilized, minimizing contamination with other biomaterials (lipids, nucleic acids, etc.). Although many standardized and reproducible procedures are available, no protocol provides an efficient extraction of all proteins of a given biological sample. High concentrations of chaotropic agents, nonionic detergents, and reducing agents are common components of extraction buffers because they allow efficient protein (even hydrophobic protein) solubilization preventing artifactual oxidation events throughout manipulation. However, the method must be adapted to requirements imposed by the sample type (cell, tissue type, storage conditions, etc.), subsequent analytical steps (avoiding interfering substances), and the aims of the study, which may require protein/peptide labeling, subcellular fractionation, enrichment of specific protein/peptide families, and so on, as will be discussed in the following sections. Especially relevant in biomedical research is comparative proteomics, because it enables the identification of protein mediators of biological responses or pathogenic
I. THE BIOLOGICAL BASIS OF HERITABILITY AND DIVERSITY
BOTTOM-UP PROTEOMICS
processes. Different combinations of protein/peptide separation strategies and mass spectrometry (MS) approaches provide reliable and accurate workflows for unsupervised extensive proteome analyses generally used in discovery studies as well as for the validation of the newly generated hypotheses.
BOTTOM-UP PROTEOMICS Gel-based Strategies A differential proteomics analysis can be conceived of as a gel-based or a gel-free approach. In the first case, proteins are separated by 2DE based on orthogonal methods resolved by two physical protein properties: the isoelectric point in the first dimension (isoelectrofocusing) and the molecular weight under denaturing conditions (sodium dodecyl sulfateepolyacrylamide gel electrophoresis [SDS-PAGE]) in the second (Rabilloud, 2011). Upon Coomassie, silver, or fluorescent staining, gels are commonly scanned, and the digital replicates are subjected to image analysis for spot detection and normalization followed by statistical analysis to determine the differential spots across the experimental conditions under comparison. A substantial improvement of this method was introduced by two-dimensional fluorescence difference gel electrophoresis, in which proteins from the samples under study are differentially labeled using specific fluorescence dyes (Cy2, Cy3, or Cy5), mixed in equal amounts in a pairwise mode and resolved by 2DE. Scanning gels with laser beams at different wavelengths, as required for optimal excitation of each Cy dye, generate individual images of the corresponding protein spot map from each sample. Then, protein levels can be compared based on the relative intensity of the spots within a single gel, which reduces variability and improves the accuracy of the method. Differential spots are then excised and incubated with proteases (trypsin is the most commonly used because the tryptic digestion is robust and generates peptides with a positive charge in the C terminal at acidic pH), although other proteases or even combinations of two proteolytic enzymes have also been implemented to circumvent the limitations of trypsin digestion in the case of certain proteins. The resulting peptide mixture is analyzed by mass spectrometric methods to enable protein identification. A simple approach to identifying proteins is peptide mass fingerprinting (PMF), in which tryptic digests are analyzed by matrix-assisted laser desorption/ionization (MALDI) time of flight (TOF) MS and the identifications are deduced from the specific set of peptide m/z ratios upon comparison with the theoretical PMFs of genome-wide protein sequence databanks (Karas and Hillenkamp, 1988). The use of
71
MALDI TOF equipment greatly increased the reliability of identification. Alternatively, peptides can be separated by liquid chromatography (LC) (nanoLC) and analyzed by tandem MS (MS/MS), which provides information relative to the amino acid sequence (Fenn et al., 1989). The chromatographic column is connected online with the MS/MS instrument through a nanoelectrospray ionization (ESI) source that allows ionization of the eluted material. Once ionized, a particular peptide or ion is isolated and fragmented by collision with an inert gas, generating a fragment spectrum that is compared with the theoretically predicted fragments for all peptide sequences in a database to identify the peptide sequence and then the precursor protein. 2DE is a relatively simple method that enables the visualization and analysis of 2000e3000 protein spots, the detection of posttranslationally modified species modifying protein isoelectric point (pI) values, and an estimation of differentially expressed proteins. However, some constraints are also associated with 2DE: low abundant (below 1000 copies), hydrophobic, extreme pI or Mr proteins are barely detected, preventing a complete description of the proteome.
Shotgun Analysis Shotgun proteomics emerges to circumvent some of the major constraints of 2DE; it proposes a gel-free alternative based on the tandem mass spectrometric analysis (MS/MS) of all peptides resulting from the proteolytic digestion of whole-protein mixtures, upon multidimensional LC separations (Zhang, 2013; Gillet et al., 2016). Hence, protein identification and quantitation are deduced from MS and MS/MS data. For differential analyses, two major workflows are available that rely on protein/peptide stable isotope labeling or a label-free strategy. In the first case, proteins can be metabolically labeled in cell culture (stable isotopic labeling with amino acids in cell culture) or peptide mixtures from different samples can be chemically (isotope-coded affinity tag [ICAT]); isobaric multiplexing tagging system [ITRAQ], or enzymatically (digestion in H18 2 O) labeled using light and heavy isotopes, and then combined equally for subsequent simultaneous nanoLC ESI-MS/ MS analysis. To increase the resolution of the experiment, different prefractionation strategies are routinely implemented. Protein extracts can be resolved in SDSPAGE gels and in gel protein digestion of a variable number of gel line slices, providing fractionated peptide pools that are subsequently analyzed by nanoLC (conventionally reversed phase C18 chromatography) coupled online with the mass spectrometer through an ESI source. Alternatively, the whole labeled peptide population resulting from the combination of all protein extracts can first be resolved by cation exchange or C18
I. THE BIOLOGICAL BASIS OF HERITABILITY AND DIVERSITY
72
9. PROTEOMIC ANALYSES
at high pH LC to analyze the obtained fractions by LCESI-MS/MS, as indicated earlier. MS and MS/MS data provide information allowing peptide assignment to an amino acidic sequence based on information inferred from the fragments generated by collision-induced dissociation (CID), most frequently, electron capture dissociation (ECD), or electron transfer dissociation (ETD), among other fragmentation strategies and also allow for the identification of PTM sites. Search engines are software that provide the means for sequence assignment from spectral information and protein identification with statistical significance (Cottrell, 2011). Relative quantification is also achieved in parallel from the intensity ratio of differentially labeled precursor ions (ICAT, for instance) or reporter ions generated upon fragmentation (iTRAQ). Label-free quantitation involves the independent analysis of at least three replicates of each sample, usually with no previous fractionation, and the abundance ratio is estimated based on the precursor signal intensity or on spectral counting (spectral counting counts the number of spectra identified for a given peptide in different biological samples and integrates the results for all measured peptides of the protein[s] that are quantified).
TOP-DOWN PROTEOMICS Bottom-up proteomics methods have greatly contributed to extend our knowledge of the human proteome, but an analysis of fragments derived from protein enzymatic or chemical digestion has disadvantages. Peptides might not be specific to a protein or protein form, leading to inconclusive identification; large protein regions may not be mapped, losing relevant information such as PTMs and sequence variants; the relations among variations or modifications within the same polypeptide chain will be lost. In this scenario, top-down MS proposes an analysis of intact proteins to provide an overview of all intact proteoforms, PTMs localization, and sequence variants (Catherman, 2014). The large complexity of most proteomic samples imposes offline separation step(s) (either electrophoretic or chromatographic) before LCMS/MS analysis. Top-down studies require highperformance mass spectrometers because high resolution and mass accuracy (in some cases, at the millidalton level) are critical to separate and assign precursor spectra containing multiple intact proteoforms or MS2 spectra with hundreds of fragment ions. Moreover, high sensitivity is also needed because proteins have broad isotopic distributions, scattering the signal of a given protein across many peaks. The use of ECD and ETD provides fragmentation patterns complementary to CID and increases the capacity to dissect molecular complexity, facilitating the quantification of proteoforms, localization of
PTMs and variations, assignment of positional isomers, and assessment of PTM interdependence. Although there are still several challenges for top-down proteomics, including protein solubilization, separation, and the detection of large intact proteins, the improvement in separation tools, the development of new mass spectrometers with unprecedented capacity, and the computational tools already available to identify and characterize proteoforms reliably allow the study of complex proteomes and expand its use in biomedical research (Toby et al., 2016).
TARGETED PROTEOMICS Nontargeted MS-based proteomics has been embraced by researchers studying human biology and disease. These wide-screening procedures offer a panoramic picture of the relative expression and modification of large numbers of proteins and are increasingly used to generate de novo functional hypothesis and as target identification methods in discovery studies. However, they are not optimal for absolute quantitative analysis of predefined targets or for testing the selected targets across large numbers of samples. Complementary procedures such as sequential window acquisition of all theoretical mass spectra (SWATH MS) and selected reaction monitoring (SRM) have emerged as methods of choice for MS-based targeted proteomics.
Selected Reaction Monitoring Any protein can theoretically be detected and quantitated in any biological matrix selectively recording precursor to fragment transitions of its proteotypic peptides, which are specific and allow for the unequivocal identification of the target protein. The sample is prepared according to general procedures as mentioned earlier, and peptides are prefractionated when required or directly analyzed by nanoLC-ESI-MS/MS. Triple quadrupole instruments are typically used; they are capable of filtering selected ions in the first and third quadrupoles, with the second serving as the collision cell. A set of proteins to be monitored in the MS instrument is selected in a number inversely proportional to the sensitivity and accuracy required in the analysis. Then, specific precursor ions (peptides) and transitions are selected to configure the SRM method, which allows specific identification and accurate quantitation (Anderson, 2006). The use of preexisting MS information from SRM data and multiple shotgun analyses available in databanks including SRM atlas (Kusebauch et al., 2016; http://www.srmatlas.org), GPMDB (http:// www.thegpm.org), and PRIDE (http://www.ebi.ac. uk/pride/) significantly reduces the time required to
I. THE BIOLOGICAL BASIS OF HERITABILITY AND DIVERSITY
PROTEIN REGULATION: POSTTRANSLATIONAL MODIFICATION ANALYSIS BASED ON ENRICHMENT PROCEDURES
develop the assay. Because specific m/z values for the parent and fragment ions are specified in the SRM method, modified peptides would not be detected unless specifically targeted. SRM emerges as a reference method for protein quantification because it has a high dynamic range, is absolute rather than relative, and is reproducible and reliable even across different laboratories when stable isotope labeled standards are spiked in the sample. For these reasons, it is increasingly used to characterize model systems, clinical samples, and toxicology studies, among others (Kearney et al., 2018). The possibility of multiplexing and redefining SRM assays for optimal performance in sample screening programs has made this strategy an attractive alternative to antibody-based methods for targeted protein analysis.
SWATH Mass Spectrometry A main limitation of SRM is that the number of targeted measurements in a single run is restricted to about 100. SWATH MS has emerged as an alternative method to mitigate this restriction: data independent acquisition MS technology based on targeted data extraction instead of targeted MS acquisition, as in SRM (Liu, 2013). In SWATH MS, MS2 data are recorded by cycling through 32 consecutive 25-Da precursor isolation windows and all fragment ions are monitored at high resolution (10 parts per million), ensuring the specificity of peptide identification. The result is a digital map of MS2 ions derived from the fragmentation of all precursor ions present in the sample in a given m/z window at a certain time. Fragment ion chromatograms for each peptide of interest can be then extracted from the digital maps to perform the targeted analysis. SWATH MS measurements rely on the use of MS reference maps of spectral libraries collecting a large set of validated peptide MS2 spectra (PeptideAtlas, Human Proteinpedia, GPM Proteomics Database, PRIDE, etc.). Combining SWATH MS data with reference maps supports high sensitivity detection and proteome coverage, gives accurate and reproducible quantification, and provides permanent maps that can be reinterrogated to validate emerging biomarkers or hypotheses. The possibility of monitoring all proteins of a sample at a constant sensitivity and reproducibility in large cohorts and the capacity to obtain a personalized proteomic profile from SWATH maps including different tissues for iterative biomarker studies will significantly extend the application of this technology in clinical research.
73
PROTEIN REGULATION: POSTTRANSLATIONAL MODIFICATION ANALYSIS BASED ON ENRICHMENT PROCEDURES PTMs enable the rapid and efficient modulation of biological functions and promote efficient adaptive responses to environmental conditions that ensure cell survival. PTMs result from the covalent bonding of different chemical moieties to specific amino acid side chains. Massive PTM screening is especially challenging because the relative stability of some PTMs and the substoichiometric proportion of the modified species hinder their detection and quantification. Protocols to preserve specific PTMs and enrichment procedures to increase the relative concentrations of modified proteins are needed to increase the depth of the study substantially.
Protein Phosphorylation Reversible protein phosphorylation is a complex network of signaling and regulatory events that affects virtually every cellular process. An understanding of phosphorylation networks in the physiological context remains limited largely owing to inefficient isolation of serine/threonine-phosphorylated peptides. Application of enrichment approaches at the level of phosphorylated proteins and/or peptides using a combination of ion exchange and immobilized metal affinity or TiO2 chromatography or specific phosphotyrosine or antibodies has allowed the identification of thousands of unique phosphorylation sites, which configure a detailed map to guide the comprehensive exploration of human biological functions that are regulated by phosphorylation (http://kinase.com/human/kinome/).
Protein Acetylation Protein lysine acetylation is a finely tuned process controlled by acetyltransferases and deacetylases that, like other PTMs, regulate a variety of physiological processes including enzyme activity, proteineprotein interactions, gene expression, and subcellular localization; it has been associated with numerous disease states including cancer, aging, and cardiovascular, metabolic, and alcoholic liver disease. Characterization of the acetyl-proteome has been greatly extended by the combination of enrichment of lysine (K)-acetylated proteins/peptides based on specific antibodies and high sensitivity and accuracy MS instruments.
I. THE BIOLOGICAL BASIS OF HERITABILITY AND DIVERSITY
74
9. PROTEOMIC ANALYSES
Protein Glycosylation Protein glycosylation is the most versatile and common protein modification that has important roles in various biological processes and disease progression, including cancer. The identification of glycopeptides in a complex protein digest is difficult and requires enrichment and isolation before MS analysis. In this regard, various strategies are used, such as the periodate/hydrazide protocol and hydrophilic and lectin-mediated affinity chromatography. Lectin affinity chromatography has been extensively used to isolate glycoproteins and glycans from various biological sources such as plasma, bile, urine, and cells. These enrichment approaches appear to be complementary because little overlap is observed when they are applied to the same sample, and they greatly enhance the MS signals of glycopeptides, facilitating the sequence and glycoside assignments in typical shotgun analyses. An additional difficulty of analyzing N-glycopeptides by MS is their size, which is often far from the preferred mass range of shotgun proteomics standards. Digestion with proteases in addition to trypsin partially circumvents these limitations resulting from primary structure constraints, as well as from the effect of the modifying hydrocarbon moiety that might eventually block the access of trypsin to an adjacent K or R residue. Finally, a targeted method to determine N-glycosylation occupancy was developed based on the conversion of asparagine (N) to aspartate (D) that occurs upon peptide-N-glycanase F digestion (this amino acid change introduces a mass difference of 0.984 Da). Combinations of these experimental approaches have been delineated to configure optimized workflows aimed at studying liver glycoproteome to define novel mechanisms of liver disease progression, mainly hepatocellular carcinoma, and to identify potential biomarkers.
CONCLUDING REMARKS AND FUTURE PERSPECTIVES Proteomics has experienced incredible improvements, providing tools to analyze complex proteomes with unprecedented sensitivity and accuracy. Hundreds upon thousands of proteins can be analyzed in a single experiment, resulting in a precious source of data to understand human biology and disease. Identification of pivotal proteins driving cellular pathways and pathogenic processes sheds light on underlying molecular
mechanisms and provides biochemical changes that enable the development of predictive biomarkers that can trigger earlier nutritional and clinical interventions. Moreover, validation of candidate biomarkers require targeted methods to circumvent the constraints inherent to the vast complexity of the proteome. Tailored methods to quantitate specific proteins facilitates the analysis of large cohorts of clinical samples and might probe the translational relevance of the differential features identified. Selective procedures to enrich specific sub-proteomes together with targeted hypothesisdriven analysis are increasingly being used by the biomedical community and are starting to bridge the gap between analytical and clinical proteomics.
References Anderson, L., Hunter, C.L., 2006. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol Cell Proteom 5, 573e588. Catherman, A.D., Skinner, O.S., Kelleher, N.L., 2014. Top down proteomics: facts and perspectives. Biochem Biophys Res Commun 445, 683e693. Cottrell, J.S., 2011. Protein identification using MS/MS data. J Proteomics 74, 1842e1851. Fenn, J.B., Mann, M., Meng, C.K., Wong, S.F., Whitehouse, C.M., 1989. Electrospray ionization for mass spectrometry of large biomolecules. Science 246 (4926), 64e71. Gillet, L.C., Leitner, A., Aebersold, R., 2016. Mass spectrometry applied to bottom-up proteomics: entering the high-throughput era for hypothesis testing. Annu Rev Anal Chem 9, 449e472. Karas, M., Hillenkamp, F., 1988. Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem 60 (20), 2299e2301. Kearney, P., Boniface, J.J., Price, N.D., Hood, L., 2018. The building blocks of successful translation of proteomics to the clinic. Curr Opin Biotechnol 51, 123e129. Kusebauch, U., Campbell, D.S., Deutsch, E.W., Chu, C.S., Spicer, D.A., Brusniak, M.Y., Slagel, J., Sun, Z., Stevens, J., Grimes, B., Shteynberg, D., Hoopmann, M.R., Blattmann, P., Ratushny, A.V., Rinner, O., Picotti, P., Carapito, C., Huang, C.Y., Kapousouz, M., Lam, H., Tran, T., Demir, E., Aitchison, J.D., Sander, C., Hood, L., Aebersold, R., Moritz, R.L., 2016. Human SRMAtlas: a resource of targeted assays to quantify the complete human proteome. Cell 166, 766e778. Liu, Y., Hu¨ttenhain, R., Collins, B., Aebersold, R., 2013. Mass spectrometric protein maps for biomarker and clinical research. Expert Rev Mol Diagn 13, 811e825. Rabilloud, T., Lelong, C., 2011. Two-dimensional gel electrophoresis in proteomics: a tutorial. J Proteomics 74, 1829e1841. Toby, T.K., Fornelli, L., Kelleher, N., 2016. Progress in top-down proteomics and the analysis of proteoforms. Annu Rev Anal Chem 9, 499e519. Zhang, Y., Fonslow, B.R., Shan, B., Baek, M.C., Yates 3rd, J.R., 2013. Protein analysis by shotgun/bottom-up proteomics. Chem Rev 113, 2343e2394.
I. THE BIOLOGICAL BASIS OF HERITABILITY AND DIVERSITY