JPROT-01748; No of Pages 5 JOURNAL OF P ROTEOM IC S XX ( 2014) X XX– X XX
Available online at www.sciencedirect.com
ScienceDirect www.elsevier.com/locate/jprot
Editorial
The proteomics quantification dilemma☆ Peter R. Jungblut⁎ Max Planck Institute for Infection Biology, Core Facility Protein Analysis, Berlin, Germany
AR TIC LE I N FO
ABS TR ACT
Article history:
Proteomics is dominated today by the protein expression discourse, which favorites the
Received 14 February 2014
bottom-up approach because of its high throughput and its high sensitivity. For quan-
Accepted 17 March 2014
tification this proceeding is misleading, if a protein is present with more than one protein species in the sample to be analyzed. The protein speciation discourse considers this more realistic situation and affords the top-down procedures or at least a separation of the protein
Keywords:
species in advance to identification and quantification. Today all of the top-down procedures
Bottom-up proteomics
are one order of magnitude less sensitive than the bottom-up ones. To increase sensitivity and
Top-down proteomics
to increase throughput are major challenges for proteomics of the next years.
Protein species
This article is part of a Special Issue entitled: 20 years of Proteomics.
Proteoform
© 2014 Published by Elsevier B.V.
Quantification
Before the first Siena Meeting “2-D Electrophoresis: from protein maps to genomes” in 1994, life sciences – and protein biochemistry in particular – was strictly hypothesis-driven. Acceptance of hypothesis-free approaches came mainly after the first definition of the proteome by Wasinger et al. [1]. The new hope was to find the proteins that are regulated under defined conditions by using global approaches, e.g. through comparison of a sample from a patient with a particular disease with a sample from a healthy individual. These regulated proteins were expected to elucidate the biological functions that are perturbed by the disease, resulting in important insights for diagnosis and therapy. Soon proteomics was driven by the protein expression discourse, which implies that it is sufficient to identify a regulated protein.
☆ This article is part of a Special Issue entitled: 20 years of Proteomics. ⁎ Max Planck Institute for Infection Biology, Core Facility Protein Analysis, Charitéplatz 1, 10117 Berlin, Germany.
Many researchers were astonished if the regulation at the protein level did not reproduce the one at the transcriptome level, but the protein expression discourse is still common in many investigations. Technologically, it has many advantages. The analysis and quantification can be performed at the peptide level. This proceeding was named “bottom-up approach”. Proteins are first digested and the resulting peptides are separated by liquid chromatography, followed by identification and quantification by mass spectrometry. The advantages of this approach are high sensitivity and high throughput [2–4]. In Fig. 1 this discourse is compared with the protein species discourse, which goes a step deeper into the proteome [5]. After finding 59 spots for heat-shock protein 27 in a human heart biopsy on a high-resolution 2-DE pattern [6], it became clear to us that the protein expression discourse is too abbreviated. Proteins diversify into protein species after translation and all those that differ in mass and/or in isoelectric point can be separated by 2-DE. Interestingly, most researchers are still stuck in the protein expression discourse and even many scientists working with 2-DE are unhappy if they find a protein in more than one spot. For quantification, a mean
http://dx.doi.org/10.1016/j.jprot.2014.03.015 1874-3919/© 2014 Published by Elsevier B.V.
Please cite this article as: Jungblut PR., The proteomics quantification dilemma, J Prot (2014), http://dx.doi.org/10.1016/ j.jprot.2014.03.015
2
JOUR NAL OF P ROTEOM ICS XX ( 2014) X XX– XX X
Protein expression discourse
Protein Species Discourse
Bottom-up proteomics
Top-down proteomics
1. Digestion of proteins 2. Analysis of the resulting peptides
1. Separation of protein species 2. Analysis of separated protein species
Fig. 1 – Comparison of protein expression discourse with protein speciation discourse. The protein expression discourse is focused on the identification of proteins by bottom-up approaches and the protein speciation discourse separates the protein species before analysis and quantification. Parts of the figure were published in [5].
value of amount of all spots assigned to the same protein was often calculated in order to obtain information on “protein expression”. We pursued the protein species search for other proteins and focused on another example of protein speciation with ESAT6 in mycobacteria. This protein was found in 24 spots, 7 of which were identified with 100% sequence coverage [7,8]. The acetylated protein species did not bind to CFP10, an important function of ESAT6 in order to move from the inside of the mycobacterium into the host cell. Protein speciation was found in many different biological materials, even in bacteria [9,10]. Consequently, we sharpened the definition of basic terms in proteomics [5,11]. Protein species are defined chemically and the name is the umbrella term above protein isoform and post-translationally modified forms. Isoforms have to be genetically produced. This isoform definition conforms exactly to the definition of the IUPAC [12]. “Protein expression” should be avoided; it invites simplification of the real situation and is misleading, at least in proteomics. The findings of protein species-specific regulation in an infection model of AGS cells by Helicobacter [13] and during the induction of apoptosis in HeLa cells [14] further confirm the protein speciation discourse. The SILAC-2-DE-MS approach enabled the quantification of 28 protein species of lamin A/C distributed over the whole 2-DE pattern. Some of the protein species were more than 10-fold upregulated, others more than 10-fold down-regulated. About 50% of the proteins occurred with more than one protein species on the 2-DE pattern. What is the logical consequence of protein speciation for protein quantification? A thought experiment (Fig. 2) provides the answer. What happens if four different protein species of one protein of two biological situations (A and B) are first digested
and the resulting peptide species are then quantified? Before answering this question the thought experiment will be briefly described. Protein species 1 is present with 100 molecules in situation A and with 100 molecules in situation B, representing a ratio of 1:1 and the other protein species with different ratios as indicated in the upper left window of the figure. The chemical structure of the four protein species is shown on the upper right side. They differ chemically by diverse combinations of a post-translational modification (PTM) in two of the peptides. As shown in the bottom left window, adding up the number of molecules for each peptide species results in a different ratio for each, which is visualized in the column diagram on the bottom right of the figure. What can we learn from the ratio of each peptide species? In all cases the ratio represents a mixture of protein species and in a real experiment one cannot distinguish as to what extent this peptide species is derived from protein species A or protein species B. If protein species A has another function than B, e.g. in one case “active”, in the other “inactive”, the ratio of the mixed peptide species is a random number without any functional information. Even if the PTMs are enriched, the ratio determined is meaningless as long as more than one position on the protein is modified and more than one protein species is present and differentially regulated. Let us assume protein species 4 is the active molecule, all other protein species are inactive. Analyzing peptide species number 2 results in a ratio of 3.5/2, despite the fact that the ratio of the active protein species is 1:2. Only if peptide species four is also quantified does the problem become apparent, because it shows a completely different ratio with 1:3. But it is impossible to determine which ratio best represents the active protein species. To make matters worse, in bottom-up experiments 100% sequence coverage is not realistic, because
Please cite this article as: Jungblut PR., The proteomics quantification dilemma, J Prot (2014), http://dx.doi.org/10.1016/ j.jprot.2014.03.015
JOURNAL OF P ROTEOM IC S XX ( 2014) X XX– X XX
3
Fig. 2 – Thought experiment with four protein species and two biological situations with a given number of molecules for each protein species. Two modification sites are known. Both of them can be modified by the same reaction. Parts of the Figure were published in [20].
with one enzyme only a limited amount of peptides is in the window of mass accessible by MS. With these results it has to be concluded that if a protein occurs with more than one protein species, the bottom-up approach is not applicable for protein species quantification. Without any knowledge about the protein species distribution of a protein, the ratio determined for one peptide is a random number, because it represents an undefined mixture of protein species. Is the only information we obtain from bottom-up experiments the identification of protein names and the elucidation of modification sites? It could be argued that protein speciation is a rare phenomenon. The examples mentioned already, the protein p53 with 222 potential modifications [15], the histones, each with more than 30 PTMs, and the many PTMs found in Uniprot, clearly confirm the protein speciation discourse. The complexity caused by different combinations of PTMs (Fig. 3) is unexpectedly high. As few as 5 different modification sites result in 32 different protein species for this protein. If there are 30 PTM sites, more than 1 billion protein species are already possible for one protein. Due to this high potential of diversity alone, the existence of a protein code may be postulated [16]. Another argument to rescue the hypothesis-free bottomup approaches could be that the model in Fig. 2 is not representative, because in reality proteins are larger and the chances of having many peptides in common in all protein species present are high. However, we again have the problem that all of these peptides represent an undefined mixture of
protein species, some of them representing the active enzymes and others the inactive ones. An increase of all of these peptides in disease, as compared with a control, does not tell us if the enzymatic activity is increased in the sample representing the disease. If we want to gain information on functionally relevant molecules we first have to separate them from the inactive ones. The first separation process could focus on an organelle or a protein complex at a defined time point, in order to include the spatial and the time dimensions as well. Then protein analytical methods such as electrophoresis or liquid chromatography may be applied to separate the protein species. Here, we reach the area of top-down experiments. In a full top-down experiment, the complete, undigested protein species is analyzed in a mass spectrometer. An example of a partly top-down approach is the SILAC-2-DE-LC–MS approach already mentioned [14], which separates the protein species first and then identifies the peptides with high sensitivity, in order to obtain information about the protein species. In total, 1200 and 2700 proteins and protein species, respectively, were identified and quantified from a HeLa cell extract, comparing control cells with apoptosis-induced cells. In an example of a top-down-approach with all four dimensions without protein digestion 1043 proteins and more than 3000 protein species were identified from nuclear and cytosolic HeLa cell protein extracts [17]. In this case, gel-free IEF was combined with gel-free electrophoresis separation according to mass, nano LC and MS.
Please cite this article as: Jungblut PR., The proteomics quantification dilemma, J Prot (2014), http://dx.doi.org/10.1016/ j.jprot.2014.03.015
4
JOUR NAL OF P ROTEOM ICS XX ( 2014) X XX– XX X
Ribosome DNA
m-RNA
Initial Protein Species
Fig. 3 – The diversity of potential protein species for a protein with five phosphorylation sites. If there is only one type of modification, which occurs n-times in a protein, the number of protein species (s) can be calculated by s = 2n. In the case of five phosphorylation sites 32 protein species are possible, which can be obtained by phosphorylation and dephosphorylation.
In principle, both examples are four-dimensional separations, because the SILAC-2-DE-LC–MS approach allows the identification of several proteins within one 2-DE spot in the MS step. The gel-free procedure has the advantage of consequently separating the protein species over all four dimensions, whereas the gel approach uses the higher sensitivity of the peptide separation for the last two separations and enables isotope labeling quantification. The comparable numbers of proteins and protein species elucidated are found by the gel approach from only one protein sample, whereas the gel-free procedure requires prefractionation into two fractions in order to obtain the resolution in the range of 1000 proteins and 3000 protein species. Nagarai et al. [18] identified about 10,255 proteins from HeLa cells, demonstrating a higher sensitivity in the order of one magnitude for the bottom-up approach used, but ignoring protein speciation and is therefore limited by the obstacles for quantification. This dilemma can only be solved by improving sensitivity and throughput for the top-down approach, because the protein species are destroyed in an early step of the bottom-up approach, causing an inherent, principle problem, which cannot be avoided within the bottom-up procedure. In addition to a deeper understanding of the molecular situation in proteomics, reaching the protein species level will improve the analysis of secondary and tertiary structure and the production and quality control of proteins as therapeutic agents. This will be one of the big challenges of the coming years. I am looking forward to a Siena Meeting with the title “From Proteome to Protein Species” or according to our American friends from the “Top-down Initiative” (http://topdownproteomics.org/) [19] “From Proteome to Proteoforms”.
Conflict of interests The author has declared no conflict of interests.
Acknowledgment The author thanks Rike Zietlow, MPIIB, Department of Molecular Biology, for her editorial help.
REFERENCES [1] Wasinger VC, Cordwell SJ, Cerpa-Poljak A, Yan JX, Gooley AA, Wilkins MR, et al. Progress with gene-product mapping of the mollicutes: Mycoplasma genitalium. Electrophoresis 1995;16:1090–4. [2] Cravatt BF, Simon GM, Yates III JR. The biological impact of mass-spectrometry-based proteomics. Nature 2007;450:991–1000. [3] Cox J, Mann M. Quantitative, high-resolution proteomics for data-driven systems biology. Ann Rev Biochem 2011;80:273–99. [4] Beck M, Claassen M, Aebersold R. Comprehensive proteomics. Curr Opinion Biotechnol 2011;22:3–8. [5] Jungblut PR, Holzhütter HG, Apweiler R, Schlüter H. The speciation of the proteome. Chem Cent J 2008;2:16. [6] Scheler C, Müller E-C, Stahl J, Müller-Werdan U, Salnikow J, Jungblut P. Identification and characterization of heat shock protein 27 protein species in human myocardial 2-DE patterns. Electrophoresis 1997;18:2823–31. [7] Okkels LM, Müller E-C, Schmid M, Rosenkrands I, Kaufmann SHE, Andersen P, et al. CFP10 discriminates between nonacetylated and acetylated ESAT-6 of Mycobacterium tuberculosis by differential interaction. Proteomics 2004;4:2954–60. [8] Lange S, Rosenkrands I, Stein R, Andersen P, Kaufmann SH, Jungblut PR. Analysis of protein species differentiation among mycobacterial low-Mr-secreted proteins by narrow pH range immobiline gel 2-DE-MALDI-MS. J Proteomics 2014;97:235–44. [9] Jungblut PR, Schaible UE, Mollenkopf H-J, Zimny-Arndt U, Raupach B, Mattow J, et al. Comparative proteome analysis of
Please cite this article as: Jungblut PR., The proteomics quantification dilemma, J Prot (2014), http://dx.doi.org/10.1016/ j.jprot.2014.03.015
JOURNAL OF P ROTEOM IC S XX ( 2014) X XX– X XX
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
Mycobacterium tuberculosis and Mycobacterium bovis BCG strains: Towards functional genomics of microbial pathogens. Mol Microbiol 1999;33:1103–17. Schmidt F, Schmid M, Facius A, Mattow J, Pleissner K-P, Jungblut PR. Iterative data analysis is the key for exhaustive analysis of peptide mass fingerprints from proteins separated by two-dimensional electrophoresis. J Am Soc Mass Spectrom 2003;14:943–56. Schlüter H, Apweiler R, Holzhütter HG, Jungblut PR. Finding one's way in proteomics: a protein species nomenclature. Chem Cent J 2009;3:11. Joint commission on biochemical nomenclature IUPAC–IUBMB: nomenclature of multiple forms of enzymes. In: Liébecq C, editor. Biochemical nomenclature and related documents. 2nd ed. Colchester: Portland Press; 1992. Holland C, Schmid M, Zimny-Arndt U, Rohloff J, Stein R, Jungblut PR, Meyer TF. Quantitative phosphoproteomics reveals link between Helicobacter pylori infection and RNA splicing modulation in host cells. Proteomics 2011;11:2798–811. Thiede B, Koehler CJ, Strozynski M, Treumann A, Stein R, Zimny-Arndt U, et al. High resolution quantitative proteomics of HeLa cells protein species using stable isotope labeling with amino acids in cell culture (SILAC), two-dimensional gel electrophoresis (2DE) and nano-liquid chromatography coupled to an LTQ-Orbitrap mass spectrometer. Mol Cell Proteomics 2013;12:529–38. DeHart CJ, Chahal JS, Flint SJ, Perlman DH. Extensive post‐translational modification of active and inactivated forms of endogenous p53. Mol Cell Proteomics 2014;13(1):1–17. http://dx.doi.org/10.1074/mcp.M113.030254. Sims III RJ, Reinberg D. Is there a code embedded in proteins that is based on post-translational modifications? Nat Rev 2008;9:1–6. Tran JC, Zamdborg L, Ahlf DR, Lee JE, Catherman AD, Durbin KR, et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 2011;480(7376):254–8.
5
[18] Nagaraj N, Wisniewski JR, Geiger T, Cox J, Kircher M, Kelso J, et al. Deep proteome and transcriptome mapping of a human cancer cell line. Mol Syst Biol 2011;7:548. [19] Smith LM, Kelleher NL. Consortium for top-down proteomics. Proteoform: a single term describing protein complexity. Nat Methods 2013;10:186–7. [20] Jungblut PR, Schlüter H. Proteinbiochemie — proteinanalyse — proteomics. Biospektrum 2008;14:37–40. Dr. Peter Roman Jungblut is the leader of the Core Facility for Protein Analysis at the Max Planck Institute for Infectionbiology (MPIIB) in Berlin since 1996 with a focus on microbial, eye lens and proteasomal proteomics. He worked from 1982 to 1990 in the lab of Joachim Klose at the Free University of Berlin to develop the experimental and theoretical prerequisites for systematic protein analysis (today named proteomics). From 1990 to 1996 he constructed a human heart proteome database at the German Heart Center and at the Max Delbrück Center in Berlin. Together with Brigitte Wittmann-Liebold PRJ founded in 1992 one of the first proteomics companies, WITA GmbH. He was a lecturer at the University of Innsbruck, Max-Planck Institute for Molecular Genetics, Berlin and Beuth Hochschule Berlin. During the construction of the web-accessible MPIIB 2D-PAGE database system containing proteomes of several organisms it became clear to him that the protein species level is the real molecular level of proteomics. The scientific work of PRJ is accessible in more than 170 refereed publications about proteomics and was presented in more than 150 lectures worldwide.
Please cite this article as: Jungblut PR., The proteomics quantification dilemma, J Prot (2014), http://dx.doi.org/10.1016/ j.jprot.2014.03.015