C H A P T E R
2 Proteomic and mass spectrometry technologies for biomarker discovery Andrei P. Drabovicha, Maria P. Pavloub, Ihor Batrucha, Eleftherios P. Diamandisa,b,c,d a
b
Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, ON, Canada Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada c Department of Clinical Biochemistry, University Health Network, Toronto, ON, Canada d Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, ON, Canada
O U T L I N E Introduction
18
Protein biomarker discovery and development pipeline
18
Proteomic samples
20
Protein identification using mass spectrometry Protein digestion Protein and peptide separation techniques Protein and peptide ionization techniques Mass spectrometry instrumentation Deconvolution and database search of tandem mass spectra Posttranslational modifications as disease biomarkers
Proteomic and Metabolomic Approaches to Biomarker Discovery https://doi.org/10.1016/B978-0-12-818607-7.00002-5
Protein quantification using mass spectrometry Label-free quantification Metabolic and enzymatic labeling Chemical labeling Selected reaction monitoring assays Separation and enrichment strategies for quantification of low-abundance proteins
22 23 23 23 24 25 25
17
26 26 28 28 29 29
Biomarker verification
30
Biomarker validation
31
Limitations of mass spectrometry for protein biomarker discovery
31
Conclusions and future outlook: Integrated biomarker discovery platform
32
References
32
Copyright # 2013 Elsevier Inc. All rights reserved.
18
2. Proteomic and mass spectrometry technologies for biomarker discovery
Abbreviations Da ELISA ESI FDA FWHM LC m/z MALDI MS MS1 MS/MS PTM SILAC SRM TOF XIC
Daltons enzyme-linked immunosorbent assay electrospray ionization the U.S. Food and Drug Administration full width at half maximum liquid chromatography mass-to-charge ratio matrix-assisted laser desorption/ionization mass spectrometry/spectrometer mass spectrum collected for all precursor ions in sample prior to fragmentation tandem mass spectrometry, or mass spectrum collected for fragment ions posttranslational modification stable isotope labeling by amino acids in cell culture selected reaction monitoring time-of-flight mass spectrometry extracted ion chromatogram
Introduction Proteomics is defined as a large-scale study of protein expression, structure, and function in time and space. Relative to genome, transcriptome, or metabolome analysis, the large diversity of protein sequences and multiple posttranslational modifications (PTMs) make proteome analysis an even more challenging undertaking. Unlike the genome, the proteome is dynamic; a static set of genes may result in different proteomic phenotypes depending on the developmental stage of an organism and environmental factors. The dynamic nature of the proteome results in a wide range of protein reference values in healthy individuals, thus complicating the clinical applications of proteomics. The last two decades have seen an impressive progress in proteomics, mainly due to significant advances in mass spectrometry (MS), high-throughput antibody production, and bioinformatics and biostatistics algorithms. The Human Proteome Project was launched in September 2010 with a goal to identify and characterize at least one protein product for each of the
estimated 20,300 protein-coding genes.1 Diseasedriven initiatives of the Human Proteome Project lay the foundation for clinical and diagnostic applications of proteins, such as development of disease biomarkers.
Protein biomarker discovery and development pipeline Development of protein biomarkers is a multiple-phase procedure, analogous to the drug development process. The biomarker development pipeline includes the formulation of a specific clinical question, identification of proteins, selection of biomarker candidates, verification of candidates in an independent cohort of samples, rigorous validation of candidates, development and validation of a clinical assay, and finally assay approval by regulatory health agencies, such as the U.S. Food and Drug Administration (FDA) or the European Medicines Agency (Fig. 1). The cost of a biomarker development study is estimated to be in the range of 10% of an entire drug development study. In addition, the discovery-to-clinical assay timeline may exceed many years. For example, the cancer biomarker HE4 was cleared by the FDA in 2000,2 but its clinical assay was not approved until 2008.3 Prior to the launch of a biomarker discovery study, one should first consider unmet clinical needs, decide whether a diagnostic molecule has a potential to answer a specific clinical question with a certain confidence, and predict whether the answer would aid in physicians’ decision making. It should be acknowledged that the clinical decision will be made, based on a biomarker’s performance in combination with noninvasive medical imaging techniques, such as magnetic resonance imaging (MRI) and/or ultrasound. Performance of a marker with high area under the receiver operating characteristic (ROC) curve may not be the sole requirement for a biomarker’s successful use
Protein biomarker discovery and development pipeline
19
FIG. 1
The proteomic biomarker development pipeline. As biomarker candidates proceed through the pipeline, the number of clinical samples increases, while analytical technologies change from complex and low-throughput mass spectrometry methods to straightforward and high-throughput immunoaffinity assays.
in clinics. Instead, based on disease character and the cost of the follow-up examination, biomarkers with either higher sensitivity or higher specificity may be preferable.4 Different types of genetic sequence features or biomolecules, such as gene mutations, single-nucleotide polymorphism (SNP) variants, mRNA transcripts, and metabolites, can be used as disease biomarkers. There is a clear advantage, however, to using proteins as biomarkers, stemming from their diversity. There is an estimated number of 20,300 genes,1 7900 unique metabolites,5 100,000 mRNA transcripts, and up to 1.8 million different protein species, if PTMs are considered.6 Being the ultimate products of gene expression, proteins reflect multiple genomic and transcriptomic alterations in their sequences, PTMs, and cellular abundance level. A fraction of proteins is secreted into blood and biological fluids and can thus be detected using noninvasive diagnostic tests. The immense diversity of protein species increases the chances of identifying a marker, or a panel of markers, for each disease state. The diversity of protein variants, however, significantly increases the analytical challenge of correct detection and measurement of a specific variant in biological samples. For example, detection of a particular nucleotide in the genome of a cell should meet the analytical challenge of searching through
3.2 109 nucleotides, while the detection of a specific amino acid in interleukin 6 in blood plasma has the challenge of searching through 1013 amino acids.7 Use of altered PTMs and protein isoforms as biomarkers is an even more challenging undertaking due to the greater complexity and dynamic turnover of PTMs. For this reason, most protein biomarker studies are still focused on the search for altered protein concentrations in biological samples. Identification of proteins in cells, tissues, and biological fluids is dominated by MS-based techniques, even though protein and antibody arrays have found their own niches.8–10 At the protein identification phase of a biomarker development pipeline, several thousand protein species are detected in a limited number of biological samples. Relative quantification approaches are then used to compile a short list of candidates for verification in an independent set of clinical samples. Biomarker verification is an important step to exclude false-positive discoveries made due to the biological and technological bias introduced at the identification phase. Assays used for verification, such as enzyme-linked immunosorbent assays (ELISA) and selected reaction monitoring (SRM),11 provide accurate and reliable comparison of protein levels in dozens to hundreds of clinical samples.
20
2. Proteomic and mass spectrometry technologies for biomarker discovery
Validation of protein biomarkers includes testing their performance in very large cohorts of clinical samples. Such studies employ standardized preclinical protein assays, rigorous blinded analysis, and multicenter collaborative trials. Finally, a clinical assay is developed for a biomarker and subjected to approval by regulatory health agencies. In vitro diagnostic assays for more than 200 unique proteins are currently approved by the FDA,12 and most of them are based on ELISA. There is not a single MS-based protein assay currently used in clinics,13 but a lot of effort is currently aimed toward the introduction of such assays into clinical practice.14–16
Proteomic samples The choice of the sample suitable for biomarker discovery study depends on a specific clinical question addressed, sample availability, and limitations of a biological model (Fig. 2). An array of proteomic samples can be used, but blood plasma or serum is the most relevant biological fluid for screening, diagnostic, or surveillance biomarkers. Blood is the most abundant body fluid and is easily collected by venipuncture, a procedure with minimal invasiveness. Given that all organs are perfused by it, blood reflects the physiologic state of the body at any time.17 However, the proteomic analysis of blood plasma or serum is very challenging due to the wide dynamic range of protein concentrations, which exceeds ten orders of magnitude and is five to six order of magnitude higher than the dynamic range of MS analysis.18,19. Lowabundant proteins present in blood are usually masked by high-abundance proteins, 22 of which constitute 99% of the total protein mass.18 In addition, physiological concentrations of salts and lipids interfere with MS-based analysis.20 Depletion of high-abundance proteins and extensive fractionation may improve detection of lowabundance proteins, but at the cost of decreased throughput and analytical reproducibility. In the quest for noninvasive diagnostic protein markers, urine is an attractive biological
fluid, given that it can be collected noninvasively and in large quantities. Although urine proteomics has been widely explored for identification of biomarkers related to renal or urogenital disorders, other health conditions such as cancer and inflammation in distant organs may also result in changes of the urine proteome.21,22 Though it contains fewer proteins than plasma, the urine proteome is still complex, with more than 1,500 proteins identified in healthy individuals.23 Another challenge of urine is the need for normalization and standardization of protein levels across different samples. Protein concentrations in urine depend on the glomerular filtration rate and thus should be normalized against reference molecules such as creatinine.24 The Human Kidney and Urine Proteome Project (HKUPP),25 a Human Proteome Organization (HUPO)-sponsored scientific initiative, provides guidelines for standardized collection and storage of urine samples along with protocols for urine sample preparation. One of the ultimate aims of this organization is to construct a reference database of normal human urine. Due to the challenges of biomarker discovery in blood and urine, the potential of other biological specimens is being widely explored. Primary sites of disease such as tissues and proximal fluids are attractive alternatives for biomarker candidate identification and verification. Commonly used proximal fluids include ascites, cerebrospinal fluid, seminal plasma, expressed prostatic secretion, nipple aspirate fluid, saliva, tears, pancreatic juice, and others. Proximal fluids such as ascites fluid in pancreatic and ovarian cancers26 often enclose the site of the disease and accumulate disease-specific proteins increasing their concentration relative to blood. For example, median levels of CA-125, an ovarian cancer biomarker used in the clinics, were found as 696 and 18,563 U/mL in serum and ascites fluid, respectively.27 Proximal disease fluids, however, are usually collected through the invasive procedures, limiting their clinical potential.28 Diseased tissue is the specimen of choice to discover tissue-based prognostic and predictive
21
Proteomic samples
lability i a v na e im c e High concentration of potential biomarkers sp Availability of nondiseased tissues Invasive collection Cellular heterogeneity
Availability Moderate sample complexity Minimal biological and experimental variability
Tissues es
al xim Pro
ds
i flu
Ce ll
lin
No influence of microenvironment No reflection of disease heterogeneity
High concentration of potential biomarkers Availability in medium to large amounts
Blo
al m
od
im
An els
od
Minimal biological variation Availability of samples at any stage of the disease Influence of host microenvironment
Urine
Translation of data to human diseases should be verified Non-invasive collection Availability in large amounts
Invasive collection High sample complexity Minimally invasive collection Availability in large amounts Reflection of physiologic state of the body
Sample complexity Wide range of protein concentrations Masking effect of high-abundance proteins
High sample complexity Need for standardization
FIG. 2 Proteomic samples used for biomarker discovery, along with their advantages and limitations. Tissue samples and proximal fluids are usually obtained through the highly invasive procedures such as surgery or biopsy, require strict ethical approval by institutional review boards, and are thus the least available specimens for proteomic experiments. Cell lines, on the contrary, are readily available through commercial suppliers.
biomarkers because tissues have high levels of protein biomarkers.29 However, biomarker candidates identified using tissue proteomics may not be detectable in the systemic bloodstream
due to insufficient leakage from the tissue to blood, increased degradation by endogenous proteases, or enhanced clearance by the kidneys.30 A major obstacle in proteomic analysis
22
2. Proteomic and mass spectrometry technologies for biomarker discovery
of tissues is the heterogeneity of cellular and extracellular composition. Laser capture microdissection (LCM) has been proposed as a tool for isolating pure cell populations from tissues, thereby reducing cell heterogeneity.31 However, LCM yields small sample sizes, is labor intensive, and requires fresh frozen tissues and a high level of expertise.32 An advantage of tissues over other specimens is the ability to obtain adjuvant nonaffected tissues from the same individual to serve as a control, thus minimizing the effects of biological heterogeneity. Nevertheless, an adjacent tissue may also be transformed at the molecular level and thus may not represent the healthy tissue.33 Given that formalin-fixed paraffin-embedded (FFPE) tissues were widely collected and preserved for more than a century, the exploitation of FFPE tissues for biomarker discovery warrants a detailed investigation. Fortunately, studies have shown that FFPE tissues are compatible with MS-based proteomic analysis.34 Ex vivo systems, such as cell lines and animal models, are also utilized for biomarker discovery. Cell lines are readily available, allow for identification of low-abundance proteins due to the reduced sample complexity, and facilitate studies with minimized biological and experimental variability since cells can be grown under well-defined conditions. No single cell line, however, can recapitulate disease heterogeneity and account for the disease microenvironment.35 Animal models, in contrast to cell lines, incorporate the effect of the host microenvironment. In addition, animal models offer minimum intraindividual variability in terms of genetic variation and environmental conditions. Furthermore, animal derived biological samples can be collected at any stage of the disease development.36 Nevertheless, it is debatable whether animal disease models can be accurately translated into human disease models. Regardless of the biological material of choice, clinical samples should be collected in a standardized way following predefined standard operating procedures (SOPs) to minimize
variations due to sample collection, handling, and storage.37 Samples should have detailed clinical annotations, such as gender, race, age, and concurrent use of medications. Given the limited availability of clinical samples, it has been proposed that high-quality samples should be used at the late stages of biomarker development.38 However, analysis of specimens of unknown quality at the identification phase increases the risk of generating false-positive markers that will drain financial and clinical resources at the verification and validation phases. The issue of sample collection and preservation for prospective studies along with the need to store very large specimen collections has driven the development of multiple biobanking initiatives. Biobanking incorporates the proper clinical annotation of specimens along with managing ethical, legal, and social issues that may vary in different states and regions.39 International networking of biobanks facilitates the use of high-quality biological specimens for translational and clinical research.
Protein identification using mass spectrometry Mass spectrometry-based approaches to protein identification involve either detection of intact proteins, referred to as top-down proteomics, or identification of protein cleavage products, referred to as bottom-up or shotgun proteomics. Top-down strategies retain a lot of information about protein sequence, protein isoforms, as well as their PTMs. Advances in topdown proteomics allow for the identification of hundreds of intact proteins in yeast and mammalian cells40,41; however, clinical applications of top-down proteomics are still limited. Bottom-up proteomic approaches suffer from a loss of information about protein isoforms and PTMs, especially for low-abundance proteins. On the contrary, bottom-up proteomics greatly benefits from superior liquid chromatography
Protein identification using mass spectrometry
(LC) separation of peptides prior to MS, requires lower amounts of material, and provides better peptide/protein fragmentation and higher sensitivity. Due to the very high number of routine protein identifications in biological samples, bottom-up proteomics remains the platform of choice for biomarker discovery pipelines. The process of protein identification using bottomup proteomic methods involves a set of consecutive steps, such as protein digestion, peptide separation by LC, peptide ionization, gas-phase peptide separation, peptide fragmentation, and detection of mass-to-charge ratios (m/z), and intensities of peptide ions and their tandem mass spectrometry (MS/MS) fragments. The variety of MS platforms used for protein identification is described in the following subsections.
Protein digestion Bottom-up proteomic approaches involve proteolytic cleavage of proteins into short peptide fragments using proteases. The most widely used enzyme is chemically modified trypsin that selectively cleaves peptide bonds C-terminal to lysine and arginine residues.42 A distinct advantage of the use of trypsin is generation of short doubly- or triply-charged peptides that are water soluble, well separated by both strong cation-exchange and reversed-phase chromatography, and susceptible to ionization using electrospray ionization (ESI). To increase the number of peptide identifications and protein sequences coverage, protein digestion protocols may be complemented by proteases with different sequence specificities, such as LysC, ArgC, AspN, and GluC.43
Protein and peptide separation techniques In the last two decades, two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) was a method of choice for protein separation in both top-down and bottom-up protein
23
identification workflows.44,45 Lately, with advances in bottom-up proteomics, separation techniques focused on fractionation of short tryptic peptides. Workflows with twodimensional separation of peptides by strong cation-exchange chromatography or isoelectric focusing followed by reversed-phase LC allowed for identification of thousands of proteins, significantly increasing proteome coverage.46,47
Protein and peptide ionization techniques One of the MS advancements that facilitated routine identification of proteins and peptides in biological samples was the discovery of soft ionization techniques.48,49 Soft ionization such as matrix-assisted laser desorption ionization (MALDI)48 and ESI49 improved the transfer of large biological molecules into the gas phase without significant structural decomposition. MALDI-MS is applied to the analysis of a variety of molecules that range from small organic compounds to large biomolecules such as immunoglobulins.50 MALDI is initiated by the absorption of an ultraviolet (UV) laser beam by a matrix material mixed with a biological sample.48 As the laser strikes the matrix, it causes ablation of the surface material and the consequent transfer of singly charged analyte ions into the gas phase. Because MALDI is more tolerant to background contaminants that suppress ionization in ESI, such as detergents, extensive sample cleanup is not mandatory. Proteins or peptides can be separated offline into multiple fractions and spotted onto a MALDI matrix plate prior to MS analysis. Recently, MALDI-TOF-MS (time-of-flight mass spectrometry) has emerged as a technique for imaging mass spectrometry (IMS) intended for the analysis of small molecules and intact proteins directly in cells and human tissues.51 IMS requires pretreatment of thin tissue slices with a MALDI matrix followed by scanning of the tissue with a laser beam, thereby providing a
24
2. Proteomic and mass spectrometry technologies for biomarker discovery
two- and even three-dimensional spatial distribution of intensities of protein, peptide, and smallmolecule ions.52 IMS holds a promise to replace immunohistochemical staining (IHC) of tissues and facilitate high-throughput approaches to verification of tissue biomarkers.52 Unlike MALDI, ESI involves an online introduction of samples into the mass spectrometer in a solvated state and is currently the most widely used technique for the proteomic biomarker discovery. Application of voltage, typically 2000–5000 V to the sample emitter tip, leads to formation of highly charged droplets that eventually evaporate, allowing ions to enter the mass spectrometer.49 Presence of highabundance peptides, organic molecules, solvent additives, and detergents can cause ionization suppression, which significantly reduces the ionization efficiency of low-abundance peptides. Since the ionization efficiency of a molecule is related to its signal intensity, significant ionization suppression can make it difficult to detect low-abundance species. A variety of protein and peptide depletion or fractionation approaches is frequently used to reduce competition of low-abundance analytes for charge, diminish ion suppression, and thus increase the number of peptide and protein identifications.18 To alleviate the effect of contaminants, differential proteomic biomarker profiling should employ identical sample preparation protocols, LC-MS instrumentation, and bioinformatics algorithms.
Mass spectrometry instrumentation The past decade has seen substantial improvements in MS/MS instrumentation, efficiency of ion transfer by ion optics, and data interpretation algorithms. Within a decade, protein identification increased from a few dozen proteins to the routine identification of more than 10,000 proteins in mammalian cells.53,54 MALDI-MS is widely used in combination with TOF instruments, as both ionization mode
and mass measurement occur in a pulsed fashion. TOF analyzers derive the mass of an analyte by measuring the flight time of each ion in a vacuum tube. Because TOF instruments were one of the earlier instruments capable of high mass accuracy, they were frequently used for topdown proteomics and studies of protein PTMs. As opposed to TOF instruments, ion-trapping (IT) instruments accumulate ions prior to their mass measurement. Since ions are given sufficient time to fill the trap, IT instruments have reasonably high sensitivity. In addition, IT instruments employ ESI, have fast scanning speeds, and offer the ability to perform multiple levels of fragmentation of the same analyte, but at the expense of poor mass accuracy (100–200 ppm) and resolution (2000 full width at half maximum [FWHM]). Quadrupole instruments use the principle of filtering peptide ions in oscillating electric fields, transmitting only ions within a narrow and predefined m/z range. Advantages of quadrupoles include fast scan times and high sensitivity. The resolution of quadrupoles, however, remains relatively low (1000 FWHM). Triple quadrupole mass spectrometers employ consecutive filtering of precursor peptide ions and fragmentation and filtering of fragments, thus increasing selectivity of analysis. The introduction of hybrid instruments that combined different modes of ion selection, fragmentation, and separation has revolutionized the field of proteomics. For example, the combination of an ion trap with an Orbitrap, an analyzer that traps ions in an orbit and uses Fourier transform algorithm to derive m/z, allows for a two-stage identification of peptides, solely in the ion-trap, or in the Orbitrap,55,56 or concurrently in both ion-trap and Orbitrap. This configuration provides high-mass accuracy (1–5 ppm) in MS and MS/MS modes, resolution up to 240,000, and relatively fast scan speeds. LTQ-FTICR, an instrument based on Fourier transform ion cyclotron resonance, offers capabilities of an Orbitrap with resolutions up to
Posttranslational modifications as disease biomarkers
750,000.57 The newest hybrid TOF analyzers not only offer fast scan times, but provide high sensitivity, high mass accuracy (2–5 ppm) and resolution (10,000–40,000) in MS and MS/MS modes.58 Such instruments are equipped with two quadrupoles in front of the TOF analyzer and enable analysis of either whole proteins or tryptic peptides. High mass accuracy and resolution allow for filtering out exact ion masses, thereby reducing background noise and eliminating coeluting contaminants.
Deconvolution and database search of tandem mass spectra Regardless of proteomic platform and choice of MS instrument, the general method of protein or peptide sequencing remains the same. In all cases, measurement of m/z of precursor ion is followed by its fragmentation using collision-induced dissociation (CID), electroncapture dissociation (ECD),59 or electrontransfer dissociation (ETD).60 The resulting raw spectrum files contain an m/z ratio of precursor ions and its MS/MS fragments. In the bottom-up proteomic approach, peptides are identified via matching of experimental MS/ MS spectra to theoretical spectra derived from an in silico digest of a database containing all known protein sequences.61 Another search approach uses the vast number of publicly available experimentally derived mass spectra to compile spectral libraries. This process is often referred to as peptide-spectrum matching; it offers faster data analysis and fewer false-positive identifications.62 Probability of the correct peptide matching at the MS1 and MS/MS levels is based on deviation of experimental parent and fragment m/z from theoretical m/z and is assessed using various scoring algorithms, such as Sequest, Mascot, Tandem, SpectumMill, Phenyx, OMSSA, and others.61,63–66 As a result, peptide sequences are derived with certain statistical probabilities
25
and false discovery rates. The use of highresolution, mass accuracy instruments reduces the number of peptides in the database that fall within the theoretical m/z range, thereby reducing the number of false-positive peptide-spectrum matches.57 Not all spectra match the theoretical database, as some spectra originate from peptides with PTMs that are not defined in the search algorithm, from peptides with SNPs, miscleaved peptides, solvent ions, contaminant small molecules, lipids, or even airborne molecules of building materials.67 An approach to circumvent the issue of nonspecific or naturally occurring cleavage products is to perform de novo sequencing in which peptide sequences are derived directly from MS1 precursor and MS/MS fragment ions, without matching to the theoretical database. This challenging task, however, requires clean MS/MS spectra and no interference from fragment ions originating from coeluting peptides and contaminants. The advantage of this approach is the identification of PTMs and unexpected proteolytic peptide fragments.
Posttranslational modifications as disease biomarkers Disease states can be caused by changes in the PTM of a protein rather than a change in the protein’s abundance. Possible disease-specific PTMs include phosphorylation, glycosylation, methylation, acetylation, ubiquitination, lipidation, and proteolysis.68 Glycosylation and phosphorylation are the most widely studied PTMs. Because many secreted and extracellular proteins are glycosylated, disturbed glycosylation patterns of proteins in blood may indicate an ongoing pathological process in a distant organ.69,70 Furthermore, disturbed glycosylation patterns may be tissue specific even in the case if the protein itself is expressed in multiple tissues. A differential phosphorylation pattern has been noted in several neurodegenerative diseases.71
26
2. Proteomic and mass spectrometry technologies for biomarker discovery
Disease specific PTMs are often missed in bottom-up proteomics studies because peptides with PTMs are often poorly ionized by ESI or missed in the consequent bioinformatics analysis that does not search for all possible PTMs. Further advances in bottom-up proteomics will eventually lead to the more detailed investigation of PTMs in disease. To enable efficient PTM analysis, multiple approaches to enrich PTM peptides, such as lectin72 or titanium oxide73 chromatography, can be used. Analysis of highly branched and heterogeneous oligosaccharide chains would require efficient de novo sequencing methods. High-resolution MS has a lot of potential to enable robust top-down analysis of PTM variations in pathological states.
Protein quantification using mass spectrometry Protein identification workflows allow for cataloging proteomes of biological samples but cannot provide accurate and reproducible quantification of proteins in large numbers of biological samples. In some biological processes, a small change in protein levels may be crucial and lead to substantial changes in cell signaling outcome or cellular phenotype.74 Quantitative proteomic methods that are accurate and reproducible enough to reveal relatively small changes in protein levels (20%) are essential. Multiple strategies available for protein quantification (Fig. 3) are categorized as either labelfree methods or methods involving protein and peptide labeling with chemical tags or heavy isotopes (e.g., C13, N15, etc.). The major advantage of label-assisted over label-free methods is the ability of former methods to derive differential protein ratios within a single MS analysis, as well as higher quantitative accuracy and precision.75 Label-free approaches generally have a wide dynamic range of quantification (i.e., four or five orders of magnitude)
and allow for quantitative comparison of large numbers of samples.
Label-free quantification Label-free quantification, such as spectral counting and extracted ion chromatograms (XIC), not only offer low sample preparation costs but have greatly improved recently with the advent of high-resolution instruments, reproducible chromatography, and powerful data analysis software.76–78 Spectral counting relies on counting the number of times that all peptides corresponding to a specific protein are sequenced. The more abundant the protein, the higher number of tryptic peptides is available for sequencing, resulting in more MS/MS events, referred to as spectral counts. Spectral counting is applied to relative and absolute protein quantification between different MS runs. Absolute protein quantification requires normalization of spectral counts by correcting for protein length (normalized spectral abundance factor, or NSAF)79 or the possible number of tryptic peptides (exponentially modified protein abundance index, emPAI).80 This method has a dynamic range of about two to three orders of magnitude but suffers from low precision, accuracy, and reproducibility, especially for low abundance proteins that are identified by only a few spectral counts.81 Extracted ion chromatogram-based quantification methods rely on measuring the threedimensional space of peptide ion intensity, m/z, and chromatographic elution time. Because XIC quantification is more accurate and suitable for measuring relative abundances of mediumabundance proteins, even a single MS/MS spectral count event will have a corresponding MS1 chromatographic peak that can be integrated.81 MS/MS fragmentation is still performed to determine identity of each peak but is not used for quantification. Another variant of XIC quantification, intensity-based absolute quantification
FIG. 3 Quantitative mass spectrometry approaches. (A) Metabolic labeling, or SILAC (stable isotope labeling by amino acids in cell culture). Control and treated cells are grown in the media with light- or heavy-isotope labeled lysine and arginine to allow for five or six cell divisions, then lysed, mixed in equimolar amounts based on total protein, digested by trypsin, and analyzed by LC-MS/MS. Heavy-isotope labeled peptides show an MS1 mass shift of 6–10 Da. (B) Chemical labeling by ICAT (isotope-coded affinity tags). Cysteine residues are labeled with light or heavy tags, proteins are mixed and trypsin digested, peptides are purified by affinity methods, and analyzed by LC-MS/MS. As a result, peptides with heavy isotope-labeled tags show a mass shift in the MS1 spectrum. (C) Chemical labeling by iTRAQ (isobaric tags for relative and absolute quantification) or TMT (tandem mass tags). Equimolar amounts of total protein extracts are digested by trypsin, peptides are labeled with isobaric amine-reactive tags, mixed and analyzed by LC-MS/MS. Following peptide fragmentation, reporter ions show a mass shift in the MS/MS spectrum. (D) Label-free approaches, including XIC (extracted ion chromatogram) and spectral counting. Following protein digestion, each sample type is analyzed separately by high-resolution mass spectrometry. XIC measures integrated MS1 intensity of a precursor ion; spectral counting measures the number of times the precursor ion was fragmented by MS/MS.
28
2. Proteomic and mass spectrometry technologies for biomarker discovery
(iBAQ), involves dividing the sum of XIC peptide intensities by the number of theoretically observable peptides.82 XIC quantification requires reproducible chromatography to enable alignment of peptide peaks and achieves a dynamic range of four orders of magnitude.
Metabolic and enzymatic labeling A common metabolic labeling strategy, SILAC (stable isotope labeling with amino acids in cell culture), involves addition of heavy isotope-labeled (13C and 15N) amino acids into the cell culture media and consecutive incorporation of these amino acids into protein sequence upon its translation in the cell.83 In SILAC experiments, treated and control cells are cultured in the media with heavy (13C and 15 N) or light (12C and 14N) isotope-labeled lysine and arginine, respectively. Upon five or more cell divisions, an equimolar mixture of both cell lysates is subjected to the sample preparation protocol. Heavy peptides in such a mixture have identical physical and chemical properties as those of light peptides but show an MS1 mass shift. Ratio of heavy-to-light peptide intensities corresponds to relative protein abundances between treated and control cells. SILAC experiments have excellent precision as any run-torun variation in LC-MS does not affect the peptide ratio; however, performing SILAC on complex samples using slow scanning instruments and dynamic exclusion settings results in missed protein identifications due to the doubled sample complexity. Only actively dividing cells, such as established cancer cell lines, are amenable to SILAC. Some primary and slow dividing cells can hardly be cultured for five divisions and, as a result, cannot be fully labeled. Labeling of proteins of whole organisms, such as bacteria, yeast, fruit flies, and even mice, is also possible by feeding them a diet containing heavy-labeled amino acids.84–89 A heavy SILAC protein mixture can also be used as a
reference standard when spiked into nonlabeled normal and disease biological fluids.90 On the negative side, SILAC experiments are relatively expensive and have a quite narrow differential quantification range of approximately twentyfold.91 Another approach to incorporate heavy isotopes into peptides involves exchange of two 16 O atoms for two 18O atoms on C-terminal peptides during enzymatic digestion of proteins in deuterated water (H218O).92 As a result, an MS shift of 4 Da between 16O- and 18O-labeled peptides is observed. The major caveat of this methodology, however, is a nonhomogeneous labeling, which results in mixed 16O18O labels, thereby affecting O16/O18 ratios.
Chemical labeling Approaches to chemical labeling of proteomic samples use heavy or light isotope-labeled and chemically reactive tags. For instance, isotope-coded affinity tags (ICAT) allow for labeling of cysteine residues in proteins.93 Once labeled, proteins from both groups are combined, affinity-purified through biotin tags, and heavy and light labeled peptides are quantified, based on their differential MS1 signals. Exclusive labeling of cysteines is the main limitation of this approach, as it reduces protein sequence coverage. On the other hand, due to the affinity capture of these peptides, sample complexity is significantly simplified, which facilitates quantification of low-abundance proteins. Isobaric tags for relative and absolute quantification (iTRAQ)94 or tandem mass tags (TMT)95 are amine-reactive tags that produce reporter ions upon MS/MS peptide fragmentation. Following protein digestion, iTRAQ allows for peptide labeling in up to eight different biological conditions. Following labeling, peptides from all conditions are pooled together and analyzed using LC-MS/MS (Fig. 3). Unlike other labeling
Protein quantification using mass spectrometry
approaches, iTRAQ utilizes MS/MS spectra for relative quantification.75
Selected reaction monitoring assays SRM is a quantitative analytical assay performed on a triple-quadrupole, quadrupoleion trap, or quadrupole-TOF mass spectrometer. Although protein identification approaches are designed to identify thousands of proteins in a limited number of samples, SRM assays are intended to measure a very limited number of proteins in a large set of samples. This strategy makes SRM an attractive technique for biomarker verification and possibly even validation. In general, an SRM assay includes the following steps: digestion of proteins, LC separation of peptides, ionization of peptides by ESI, filtering of peptides in the first quadrupole, fragmentation of peptides in the second quadrupole, filtering of peptide fragments in the third quadrupole, and measurement of intensities of three selected fragment ions.96,97 A known amount of a heavy-isotope labeled peptide is spiked into the digest and used to calculate the absolute amount of the endogenous light peptide. Addition of stable-isotope labeled peptide standards increases specificity and reproducibility of quantification due to the correct identification of analyte peak in the presence of multiple contaminant peptides and accurate relative quantification. It is assumed that trypsin digestion is complete and that the amount of proteotypic peptide reflects the absolute amount of the corresponding protein. Such an assumption is not always correct but is acceptable when the relative abundance of proteins is measured. More accurate measurement of absolute protein amounts is achieved with heavy isotope-labeled proteins98 or concatenated peptide standards,99 which account for variation of protein digestion. With state-of-the-art SRM assays, up to 100 peptides representing 100 medium-to-high-
29
abundance proteins in the range 0.1 μg/mL to 1 mg/mL can be measured simultaneously in the unfractionated digest of biological fluid while achieving coefficients of variation under 20%.47,100 There are several concerns with SRM-based assays, and these mostly stem from sample complexity and limitations in instrument sensitivity and selectivity. Ideally, the number of sample preparation steps prior to LC-SRM measurement should be minimal to allow for high-throughput analysis and minimize variability, although this benefit comes at the cost of decreased assay sensitivity. SRM assays are developed using either experimental proteome identification data or publicly available databases such as Peptide Atlas101 or GPM proteome database.102 Advantages of these databases include integration of hundreds of experiments and unique algorithms to rank proteotypic peptides by their performance in LC-MS/MS experiments allowing for the prediction of which peptides are suitable for SRM assay development. Synthetic peptides can also be used at this point to facilitate assay development. Software tools designed to aid in SRM assay development include commercial software provided by instrument vendors, such as Pinpoint® (Thermo Fisher Inc.) and MRMPilot® (AB Sciex Inc.), as well as license-free Skyline,103 MRMaid,104 mProphet,105 and SRMCollider.106 Among all MS techniques, SRM assays remain the methods of choice for protein quantification and biomarker verification due to their sensitivity, high-throughput capabilities, and multiplexing potential.
Separation and enrichment strategies for quantification of low-abundance proteins Relatively low sensitivity and moderate throughput of MS-based protein assays ( 100 ng/mL) remain two major limitations of their use for biomarker validation studies and clinical analysis. Because blood serum levels of many
30
2. Proteomic and mass spectrometry technologies for biomarker discovery
established clinical biomarkers are in the 10 pg/mL to 10 ng/mL range,18 high-abundance proteins mask potential low-abundance biomarkers and significantly compromise their quantification using MS. Thus, LC-SRM measurement of low-abundance proteins can be achieved only through additional separation and enrichment. A set of strategies, such as strong anion- or cation-exchange chromatography and isoelectric focusing, are used to remove high-abundance or enrich low-abundance proteins.107–109 Major high-abundance proteins can also be removed by immunodepletion using the affinity columns.107,110 Alternatively, low-abundance proteins can be enriched by affinity purification using antibodies or aptamers111–113; however, this approach has a reduced multiplexing potential. Similar approaches, such as SISCAPA (stable isotope standards and capture by antipeptide antibodies), employ antibodies developed against proteotypic peptides.114,115 Because antibody development against synthetic peptides is more straightforward relative to intact proteins, use of such approaches is increasing. Improved sensitivity (down to 1 ng/mL) and increased multiplexing and throughput capabilities of SISCAPA assays enable accurate verification of biomarker candidates in blood plasma.115,116 In addition, as many known protein biomarkers in clinical use are posttranslationally modified with N-glycosylation,117 lectin affinity chromatography is sometimes used to enrich N-glycoproteins and N-glycopeptides prior to LC-SRM analysis.96,117
Biomarker verification Upon completion of the protein identification phase, anywhere from dozens to hundreds of proteins are usually selected as potential biomarkers. Large variation of analysis and poor reproducibility of commonly used label-free approaches constitute serious technological limitations of the identification phase. Biological
factors such as intraindividual variations of protein levels during the day as well as wide interindividual distribution of physiological levels of proteins in healthy individuals also result in significant bias. The potential of a certain protein biomarker should be confirmed first by verification using an independent set of samples. Even though there is a rapidly increasing number of publications reporting identification of potential biomarkers, the rate of newly regulatory agency approved protein biomarkers is steadily decreasing in the last decade.18,118 This decrease can be partially explained by a high number of false-positive candidates generated at the identification phase, difficulties of proceeding to biomarker verification and validation phases, and shortage of academic grants that fund translation of discovery data into clinics. Regardless of the specimen analyzed during the protein identification phase, verification should be performed with specimens that are intended for clinical use and accurately reflect the target population.119 Proper control subjects, as defined by the inclusion or exclusion criteria, are essential for meaningful data interpretation and should be matched for physiologic factors such as age and gender to control potential confounding factors. Preanalytical sources of variation such as biases in sample collection and storage should be also carefully evaluated, especially given that verification studies are performed with retrospectively collected samples. Finally, the size of study population should be calculated to ensure adequate statistical power,119 and the results of the study must undergo a rigorous statistical analysis. Importance of an appropriate statistical analysis is sometimes overlooked in the biomarker discovery field. At the initial steps of protein biomarker discovery, thousands of proteins are typically identified and selected, based on their relative abundance in disease versus control groups. Proper selection of candidates, however, should include robust statistical analysis based on statistical probability (P-values) of
Limitations of mass spectrometry for protein biomarker discovery
differentiating groups of samples. Furthermore, because thousands of proteins are tested simultaneously, P-values should be corrected for multiple testing hypothesis.120,121 Such correction should also be performed when a set of biomarker candidates is verified using a multiplex SRM assay. Development of multimarker diagnostic signatures require even more advanced statistical algorithms.122,123 By the end of the verification phase, many biomarker candidates are eliminated, resulting in a small and manageable list of candidates that will proceed to the biomarker validation phase.
Biomarker validation Biomarker validation is a multifaceted procedure that requires collaboration of multiple clinical centers and carries a significant financial burden. Ideally, only the most promising candidates that have proven their potential at the verification phase and for which robust quantitative assays have been developed will enter the validation phase. The importance of high-quality quantitative assays was demonstrated by the prostate lung colorectal ovarian (PLCO) cancer screening trial in which multiple ovarian cancer biomarker candidates were tested.124 As a result, it was shown that only markers with analytical assays achieving a coefficient of variation less than 30% performed with adequate diagnostic sensitivity. Validation studies should be performed in both a retrospective and prospective manner using independent sample cohorts ideally collected by multiple hospitals.119 Unbiased presentation of the results of validation studies holds the key for the final assessment of a biomarker performance.125 Study population should recapitulate the general population both in terms of disease prevalence and stage to allow for correct data interpretation and evaluation of biomarker performance. Power calculations are necessary to
31
define the appropriate study size and ensure statistical significance. Validation phase requires large numbers of high-quality specimens, availability of which may be the bottleneck of biomarker development. International multicenter collaborations and centralized registries of clinical samples are founded to alleviate this limitation.125 To minimize preanalytical biases, all samples should be collected, stored, and processed using predefined SOPs. Influence of preanalytical parameters such as sample handling and sample preparation, protein stability, intra- and interindividual variations need to be addressed prior to the large-scale validation studies. The ultimate question of the validation phase is whether the biomarker candidate addresses the unmet clinical need that prompted its search. However, the true clinical utility of a biomarker cannot be assessed without the introduction of the marker in the clinic and continuous monitoring of its performance for extended periods of time.
Limitations of mass spectrometry for protein biomarker discovery Limitations of protein biomarker development studies stem from biological factors, such as intra- and interindividual variation of protein concentrations, preanalytical variations, such as protein stability, and technological limitations of proteomic sample preparation and MS. Major limitations of proteomics and MS, in general, and as a technique for biomarker discovery studies, include: • Lack of the general quantitative relationship between ion intensity and the amount of analyte, which makes all MS-based measurements relative. • Significant effect of matrix resulting in the ion suppression and deviation from linear correlation between protein amount and spectral intensity of the same analyte.
32
2. Proteomic and mass spectrometry technologies for biomarker discovery
• Multiple steps of protein fractionation, derivatization, and trypsin digestion in bottom-up proteomic approaches that lead to high day-to-day variability and low reproducibility of protein assays. Biological biases and poor-quality clinical samples, amplified by technological limitations of MS, often lead to a large number of falsepositive discoveries. Considering the high cost of MS instruments and their maintenance, complex data analysis, and the need for highly experienced personnel, a large number of false discoveries make biomarker discovery a costly and ineffective exercise that can lead to general frustration. Awareness of the methodological limitations of proteomics and MS and careful design of biomarker development pipelines should decrease the number of potential biomarkers that never end up in the clinic. Careful study design can hopefully alleviate the disappointment in the inability of proteomics to deliver on the initial promise of large numbers of disease-specific biomarkers.126
Conclusions and future outlook: Integrated biomarker discovery platform A set of biological concepts and analytical techniques can be incorporated into an integrated protein biomarker development platform. Current biomarker discovery strategies often rely on identification of differentially expressed proteins and their association with a certain disease. The exact mechanism of differential expression and functional role of protein biomarkers in disease are often not known nor studied. An integrated biomarker discovery platform needs to be complemented with genomic, transcriptomic, and metabolomic data. The main purpose of an integrated platform is not only to make the use of data accumulated by all -omics technologies but also plan all steps and phases down the long road that may lead
to a regulatory agency approved clinical assay. It should be acknowledged that the ultimate goal of biomarker development is not merely separate groups of clinical samples, but to provide reliable guidance for correct decision making in clinics, such as performing relevant diagnostic biopsy or surgery or providing relevant therapy. Thus, the discovery, verification, and validation steps of the biomarker discovery pipeline should be tuned for a specific purpose—a priori the biomarker development study. The protein biomarker discovery and development field is projected to grow significantly and become an important part of biomedical research aimed at detecting diseases at early stages, reducing the financial burden on healthcare, and allowing for personalized medicine approaches.
References 1. Legrain P, Aebersold R, Archakov A, Bairoch A, Bala K, Beretta L, et al. The human proteome project: current state and future direction. Mol Cell Proteomics 2011;10(7) M111.009993. 2. Ono K, Tanaka T, Tsunoda T, Kitahara O, Kihara C, Okamoto A, et al. Identification by cDNA microarray of genes involved in ovarian carcinogenesis. Cancer Res 2000;60(18):5007–11. 3. Anastasi E, Marchei GG, Viggiani V, Gennarini G, Frati L, Reale MG. HE4: a new potential early biomarker for the recurrence of ovarian cancer. Tumour Biol 2010;31(2): 113–9. 4. Hartwell L, Mankoff D, Paulovich A, Ramsey S, Swisher E. Cancer biomarkers: a systems approach. Nat Biotechnol 2006;24(8):905–8. 5. Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B, et al. HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 2009;37:D603–10. 6. Jensen ON. Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Curr Opin Chem Biol 2004;8(1):33–41. 7. Landegren U, Vanelid J, Hammond M, Nong RY, Wu D, Ulleras E, et al. Opportunities for sensitive plasma proteome analysis. Anal Chem 2012;84(4):1824–30. 8. Schroder C, Jacob A, Tonack S, Radon TP, Sill M, Zucknick M, et al. Dual-color proteomic profiling of complex samples with a microarray of 810 cancer-related antibodies. Mol Cell Proteomics 2010;9(6):1271–80.
References
9. Tabakman SM, Lau L, Robinson JT, Price J, Sherlock SP, Wang H, et al. Plasmonic substrates for multiplexed protein microarrays with femtomolar sensitivity and broad dynamic range. Nat Commun 2011;2:466. 10. Wu W, Slastad H, de la Rosa Carrillo D, Frey T, Tjonnfjord G, Boretti E, et al. Antibody array analysis with label-based detection and resolution of protein size. Mol Cell Proteomics 2009;8(2):245–57. 11. Picotti P, Aebersold R. Selected reaction monitoringbased proteomics: workflows, potential, pitfalls and future directions. Nat Meth 2012;9(6):555–66. 12. Anderson NL. The clinical plasma proteome: a survey of clinical assays for proteins in plasma and serum. Clin Chem 2010;56(2):177–85. 13. Li J, Kelm KB, Tezak Z. Regulatory perspective on translating proteomic biomarkers to clinical diagnostics. J Proteomics 2011;74(12):2682–90. 14. Addona TA, Abbatiello SE, Schilling B, Skates SJ, Mani DR, Bunk DM, et al. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotechnol 2009;27(7):633–41. 15. Prakash A, Rezai T, Krastins B, Sarracino D, Athanas M, Russo P, et al. Platform for establishing interlaboratory reproducibility of selected reaction monitoring-based mass spectrometry peptide assays. J Proteome Res 2010;9(12):6678–88. 16. Lopez MF, Rezai T, Sarracino DA, Prakash A, Krastins B, Athanas M, et al. Selected reaction monitoring-mass spectrometric immunoassay responsive to parathyroid hormone and related variants. Clin Chem 2010;56(2):281–90. 17. Issaq HJ, Xiao Z, Veenstra TD. Serum and plasma proteomics. Chem Rev 2007;107(8):3601–20. 18. Anderson NL, Anderson NG. The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics 2002;1(11):845–67. 19. Domon B, Aebersold R. Mass spectrometry and protein analysis. Science 2006;312(5771):212–7. 20. Zhao X, Barber-Singh J, Shippy SA. MALDI-TOF MS detection of dilute, volume-limited peptide samples with physiological salt levels. Analyst 2004; 129(9):817–22. 21. Sobhani K. Urine proteomic analysis: use of twodimensional gel electrophoresis, isotope coded affinity tags, and capillary electrophoresis. Methods Mol Biol 2009;641:325–46. 22. Dudley JT, Butte AJ. Identification of discriminating biomarkers for human disease using integrative network biology. Pac Symp Biocomput 2009;27–38. 23. Adachi J, Kumar C, Zhang Y, Olsen JV, Mann M. The human urinary proteome contains more than 1500 proteins, including a large proportion of membrane proteins. Genome Biol 2006;7(9):R80.
33
24. Waikar SS, Sabbisetti VS, Bonventre JV. Normalization of urinary biomarkers to creatinine during changes in glomerular filtration rate. Kidney Int 2010;78(5):486–94. 25. Yamamoto T. The 4th Human Kidney and Urine Proteome Project (HKUPP) workshop. 26 September 2009, Toronto, Canada. Proteomics 2010;10 (11):2069–70. 26. Kuk C, Kulasingam V, Gunawardana CG, Smith CR, Batruch I, Diamandis EP. Mining the ovarian cancer ascites proteome for potential ovarian cancer biomarkers. Mol Cell Proteomics 2009;8(4):661–9. 27. Sedlaczek P, Frydecka I, Gabrys M, Van Dalen A, Einarsson R, Harlozinska A. Comparative analysis of CA125, tissue polypeptide specific antigen, and soluble interleukin-2 receptor alpha levels in sera, cyst, and ascitic fluids from patients with ovarian carcinoma. Cancer 2002;95(9):1886–93. 28. Teng PN, Bateman NW, Hood BL, Conrads TP. Advances in proximal fluid proteomics for disease biomarker discovery. J Proteome Res 2010;9(12):6091–100. 29. Kondo T. Tissue proteomics for cancer biomarker development: laser microdissection and 2D-DIGE. BMB Rep 2008;41(9):626–34. 30. Good DM, Thongboonkerd V, Novak J, Bascands JL, Schanstra JP, Coon JJ, et al. Body fluid proteomics for biomarker discovery: lessons from the past hold the key to success in the future. J Proteome Res 2007;6 (12):4549–55. 31. Banks RE, Dunn MJ, Forbes MA, Stanley A, Pappin D, Naven T, et al. The potential use of laser capture microdissection to selectively obtain distinct populations of cells for proteomic analysis—preliminary findings. Electrophoresis 1999;20(4-5):689–700. 32. Gutstein HB, Morris JS. Laser capture sampling and analytical issues in proteomics. Expert Rev Proteomics 2007;4(5):627–37. 33. Soto AM, Sonnenschein C. Emergentism as a default: cancer as a problem of tissue organization. J Biosci 2005;30(1):103–18. 34. Ralton LD, Murray GI. The use of formalin fixed wax embedded tissue for proteomic analysis. J Clin Pathol 2011;64(4):297–302. 35. Kulasingam V, Diamandis EP. Tissue culture-based breast cancer biomarker discovery platform. Int J Cancer 2008;123(9):2007–12. 36. Frese KK, Tuveson DA. Maximizing mouse cancer models. Nat Rev Cancer 2007;7(9):645–58. 37. Ransohoff DF, Gourlay ML. Sources of bias in specimens for research about molecular markers for cancer. J Clin Oncol 2010;28(4):698–704. 38. Hinestrosa MC, Dickersin K, Klein P, Mayer M, Noss K, Slamon D, et al. Shaping the future of biomarker
34
39.
40.
41.
42. 43.
44.
45.
46.
47.
48.
49.
50.
51.
2. Proteomic and mass spectrometry technologies for biomarker discovery
research in breast cancer to ensure clinical relevance. Nat Rev Cancer 2007;7(4):309–15. Vaught JB, Caboux E, Hainaut P. International efforts to develop biospecimen best practices. Cancer Epidemiol Biomarkers Prev 2010;19(4):912–5. Tran JC, Zamdborg L, Ahlf DR, Lee JE, Catherman AD, Durbin KR, et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 2011;480(7376):254–8. Kellie JF, Catherman AD, Durbin KR, Tran JC, Tipton JD, Norris JL, et al. Robust analysis of the yeast proteome under 50 kDa by molecular-mass-based fractionation and top-down mass spectrometry. Anal Chem 2012;84(1):209–15. Northrop JH, Kunitz M. Isolation of protein crystals possessing tryptic activity. Science 1931;73(1888):262–3. Swaney DL, Wenger CD, Coon JJ. Value of using multiple proteases for large-scale mass spectrometry-based proteomics. J Proteome Res 2010;9(3):1323–9. Shevchenko A, Jensen ON, Podtelejnikov AV, Sagliocco F, Wilm M, Vorm O, et al. Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels. Proc Natl Acad Sci U S A 1996;93(25):14440–5. Clauser KR, Hall SC, Smith DM, Webb JW, Andrews LE, Tran HM, et al. Rapid mass spectrometric peptide sequencing and mass matching for characterization of human melanoma proteins isolated by twodimensional PAGE. Proc Natl Acad Sci U S A 1995; 92(11):5072–6. Washburn MP, Wolters D, Yates 3rd JR. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 2001;19(3):242–7. Drabovich AP, Pavlou MP, Dimitromanolakis A, Diamandis EP. Quantitative analysis of energy metabolic pathways in MCF-7 breast cancer cells by selected reaction monitoring assay. Mol Cell Proteomics 2012;11:422–34. Karas M, Bachmann D, Bahr U, Hillenkamp F. Matrixassisted ultraviolet laser desorption of non-volatile compounds. Int J Mass Spectrom Ion Processes 1987;78:53–68. Fenn JB, Mann M, Meng CK, Wong SF, Whitehouse CM. Electrospray ionization for mass spectrometry of large biomolecules. Science 1989;246(4926):64–71. Alexander AJ, Hughes DE. Monitoring of IgG antibody thermal stability by micellar electrokinetic capillary chromatography and matrix-assisted laser desorption/ionization mass spectrometry. Anal Chem 1995;67 (20):3626–32. Caprioli RM, Farmer TB, Gile J. Molecular imaging of biological samples: localization of peptides and
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64. 65.
proteins using MALDI-TOF MS. Anal Chem 1997; 69(23):4751–60. Schwamborn K, Caprioli RM. Molecular imaging by mass spectrometry—looking beyond classical histology. Nat Rev Cancer 2010;10(9):639–46. Michalski A, Damoc E, Hauschild JP, Lange O, Wieghaus A, Makarov A, et al. Mass spectrometrybased proteomics using Q Exactive, a high-performance benchtop quadrupole Orbitrap mass spectrometer. Mol Cell Proteomics 2011;10(9)M111.011015. Geiger T, Wehner A, Schaab C, Cox J, Mann M. Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol Cell Proteomics 2012;11(3)M111.014050. Makarov A. Electrostatic axially harmonic orbital trapping: a high-performance technique of mass analysis. Anal Chem 2000;72(6):1156–62. Hu Q, Noll RJ, Li H, Makarov A, Hardman M, Graham Cooks R. The Orbitrap: a new mass spectrometer. J Mass Spectrom 2005;40(4):430–43. Scigelova M, Hornshaw M, Giannakopulos A, Makarov A. Fourier transform mass spectrometry. Mol Cell Proteomics 2011;10(7)M111.009431. Andrews GL, Simons BL, Young JB, Hawkridge AM, Muddiman DC. Performance characteristics of a new hybrid quadrupole time-of-flight tandem mass spectrometer (TripleTOF 5600). Anal Chem 2011;83(13): 5442–6. Zubarev RA, Kelleher NL, McLafferty FW. Electron capture dissociation of multiply charged protein cations. a nonergodic process. J Am Chem Soc 1998; 120(13):3265–6. Syka JE, Coon JJ, Schroeder MJ, Shabanowitz J, Hunt DF. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc Natl Acad Sci U S A 2004;101(26):9528–33. Eng JK, McCormack AL, Yates Iii JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 1994;5(11):976–89. Lam H, Deutsch EW, Eddes JS, Eng JK, King N, Stein SE, et al. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 2007;7(5):655–67. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999;20(18):3551–67. Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004;20(9):1466–7. Colinge J, Masselot A, Giron M, Dessingy T. Magnin. Towards high-throughput tandem mass spectrometry data identification. Proteomics 2003;3(8):1454–63.
References
66. Carr S, Aebersold R, Baldwin M, Burlingame A, Clauser K, Nesvizhskii A. The need for guidelines in publication of peptide and protein identification data: working group on publication guidelines for peptide and protein identification data. Mol Cell Proteomics 2004;3(6):531–3. 67. Guo X, Bruins AP, Covey TR. Characterization of typical chemical background interferences in atmospheric pressure ionization liquid chromatography-mass spectrometry. Rapid Commun Mass Spectrom 2006;20(20):3145–50. 68. Karsdal MA, Henriksen K, Leeming DJ, Woodworth T, Vassiliadis E, Bay-Jensen AC. Novel combinations of Post-Translational Modification (PTM) neo-epitopes provide tissue-specific biochemical markers—are they the cause or the consequence of the disease? Clin Biochem 2010;43(10-11):793–804. 69. Kuzmanov U, Jiang N, Smith CR, Soosaipillai A, Diamandis EP. Differential N-glycosylation of kallikrein 6 derived from ovarian cancer cells or the central nervous system. Mol Cell Proteomics 2009;8(4):791–8. 70. Peracaula R, Tabares G, Royle L, Harvey DJ, Dwek RA, Rudd PM, et al. Altered glycosylation pattern allows the distinction between prostate-specific antigen (PSA) from normal and tumor origins. Glycobiology 2003;13(6):457–70. 71. Hampel H, Blennow K, Shaw LM, Hoessler YC, Zetterberg H, Trojanowski JQ. Total and phosphorylated tau protein as biological markers of Alzheimer’s disease. Exp Gerontol 2010;45(1):30–40. 72. Kaji H, Yamauchi Y, Takahashi N, Isobe T. Mass spectrometric identification of N-linked glycopeptides using lectin-mediated affinity capture and glycosylation site-specific stable isotope tagging. Nat Protoc 2007;1(6):3019–27. 73. Mazanek M, Mituloviae G, Herzog F, Stingl C, Hutchins JR, Peters JM, et al. Titanium dioxide as a chemo-affinity solid phase in offline phosphopeptide chromatography prior to HPLC-MS/MS analysis. Nat Protoc 2007;2(5):1059–69. 74. Legewie S, Bluthgen N, Schafer R, Herzel H. Ultrasensitization: switch-like regulation of cellular signaling by transcriptional induction. PLoS Comput Biol 2005;1(5):e54. 75. Wang H, Alvarez S, Hicks LM. Comprehensive comparison of iTRAQ and label-free LC-based quantitative proteomics approaches using two Chlamydomonas reinhardtii strains of interest for biofuels engineering. J Proteome Res 2012;11(1):487–501. 76. Old WM, Meyer-Arendt K, Aveline-Wolf L, Pierce KG, Mendoza A, Sevinsky JR, et al. Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol Cell Proteomics 2005;4(10):1487–502.
35
77. Schilling B, Rardin MJ, Maclean BX, Zawadzka AM, Frewen BE, Cusack MP, et al. Platform-independent and label-free quantitation of proteomic data using MS1 extracted ion chromatograms in Skyline: application to protein acetylation and phosphorylation. Mol Cell Proteomics 2012;11(5):202–14. 78. Hoekman B, Breitling R, Suits F, Bischoff R, Horvatovich P. msCompare: a framework for quantitative analysis of label-free LC-MS data for comparative candidate biomarker studies. Mol Cell Proteomics 2012;11(6)M111.015974. 79. Zybailov B, Mosley AL, Sardiu ME, Coleman MK, Florens L, Washburn MP. Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. J Proteome Res 2006;5(9):2339–47. 80. Ishihama Y, Oda Y, Tabata T, Sato T, Nagasu T, Rappsilber J, et al. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics 2005;4(9):1265–72. 81. Trudgian DC, Ridlova G, Fischer R, Mackeen MM, Ternette N, Acuto O, et al. Comparative evaluation of label-free SINQ normalized spectral index quantitation in the central proteomics facilities pipeline. Proteomics 2011;11(14):2790–7. 82. Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, et al. Global quantification of mammalian gene expression control. Nature 2011; 473(7347):337–42. 83. Ong SE, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, et al. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 2002;1(5):376–86. 84. Hanke S, Besir H, Oesterhelt D, Mann M. Absolute SILAC for accurate quantitation of proteins in complex mixtures down to the attomole level. J Proteome Res 2008;7(3):1118–30. 85. Soufi B, Kumar C, Gnad F, Mann M, Mijakovic I, Macek B. Stable isotope labeling by amino acids in cell culture (SILAC) applied to quantitative proteomics of Bacillus subtilis. J Proteome Res 2010;9(7):3638–46. 86. Gruhler A, Olsen JV, Mohammed S, Mortensen P, Faergeman NJ, Mann M, et al. Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Mol Cell Proteomics 2005;4(3):310–27. 87. de Godoy LM, Olsen JV, de Souza GA, Li G, Mortensen P, Mann M. Status of complete proteome analysis by mass spectrometry: SILAC labeled yeast as a model system. Genome Biol 2006;7(6):R50.
36
2. Proteomic and mass spectrometry technologies for biomarker discovery
88. Sury MD, Chen JX, Selbach M. The SILAC fly allows for accurate protein quantification in vivo. Mol Cell Proteomics 2010;9(10):2173–83. 89. Kruger M, Moser M, Ussar S, Thievessen I, Luber CA, Forner F, et al. SILAC mouse for quantitative proteomics uncovers kindlin-3 as an essential factor for red blood cell function. Cell 2008;134(2):353–64. 90. Geiger T, Wisniewski JR, Cox J, Zanivan S, Kruger M, Ishihama Y, et al. Use of stable isotope labeling by amino acids in cell culture as a spike-in standard in quantitative proteomics. Nat Protoc 2011;6(2): 147–57. 91. Asara JM, Christofk HR, Freimark LM, Cantley LC. A label-free quantification method by MS/MS TIC compared to SILAC and spectral counting in a proteomics screen. Proteomics 2008;8(5):994–9. 92. Yao X, Freas A, Ramirez J, Demirev PA, Fenselau C. Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. Anal Chem 2001;73(13):2836–42. 93. Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol 1999;17(10):994–9. 94. Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, et al. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics 2004;3 (12):1154–69. 95. Thompson A, Schafer J, Kuhn K, Kienle S, Schwarz J, Schmidt G, et al. Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal Chem 2003;75 (8):1895–904. 96. Stahl-Zeng J, Lange V, Ossola R, Eckhardt K, Krek W, Aebersold R, et al. High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites. Mol Cell Proteomics 2007;6(10):1809–17. 97. Drabovich AP, Jarvi K, Diamandis EP. Verification of male infertility biomarkers in seminal plasma by multiplex selected reaction monitoring assay. Mol Cell Proteomics 2011;10(12)M110.004127. 98. Stergachis AB, MacLean B, Lee K, Stamato-yannopoulo s JA, MacCoss MJ. Rapid empirical discovery of optimal peptides for targeted proteomics. Nat Methods 2011; 8(12):1041–3. 99. Pratt JM, Simpson DM, Doherty MK, Rivers J, Gaskell SJ, Beynon RJ. Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes. Nat Protoc 2006;1 (2):1029–43. 100. Drabovich AP, Diamandis EP. Combinatorial peptide libraries facilitate development of multiple reaction monitoring assays for low-abundance proteins. J Proteome Res 2010;9(3):1236–45.
101. Deutsch EW, Lam H, Aebersold R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep 2008;9(5):429–34. 102. Craig R, Cortens JP, Beavis RC. Open source system for analyzing, validating, and storing protein identification data. J Proteome Res 2004;3(6):1234–42. 103. MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B, et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010;26(7):966–8. 104. Mead JA, Bianco L, Ottone V, Barton C, Kay RG, Lilley KS, et al. MRMaid, the web-based tool for designing multiple reaction monitoring (MRM) transitions. Mol Cell Proteomics 2009;8(4):696–705. 105. Reiter L, Rinner O, Picotti P, Huttenhain R, Beck M, Brusniak MY, et al. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat Methods 2011;8(5):430–5. 106. Rost HL, Malmstrom L, Aebersold R. A computational tool to detect and avoid redundancy in selected reaction monitoring. Mol Cell Proteomics 2012;11(8):540–9. 107. Keshishian H, Addona T, Burgess M, Kuhn E, Carr SA. Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics 2007;6(12):2212–29. 108. Keshishian H, Addona T, Burgess M, Mani DR, Shi X, Kuhn E, et al. Quantification of cardiovascular biomarkers in patient plasma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics 2009;8(10):2339–49. 109. Rafalko A, Dai S, Hancock WS, Karger BL, Hincapie M. Development of a Chip/Chip/SRM platform using digital chip isoelectric focusing and LC-Chip mass spectrometry for enrichment and quantitation of low abundance protein biomarkers in human plasma. J Proteome Res 2011;11(2):808–17. 110. Qian WJ, Kaleta DT, Petritis BO, Jiang H, Liu T, Zhang X, et al. Enhanced detection of low abundance human plasma proteins using a tandem IgY12SuperMix immunoaffinity separation strategy. Mol Cell Proteomics 2008;7(10):1963–73. 111. Nicol GR, Han M, Kim J, Birse CE, Brand E, Nguyen A, et al. Use of an immunoaffinity-mass spectrometrybased approach for the quantification of protein biomarkers from serum samples of lung cancer patients. Mol Cell Proteomics 2008;7(10):1974–82. 112. Kulasingam V, Smith CR, Batruch I, Buckler A, Jeffery DA, Diamandis EP. “Product ion monitoring” assay for prostate-specific antigen in serum using a linear ion-trap. J Proteome Res 2008;7(2):640–7. 113. Drabovich AP, Okhonin V, Berezovski M, Krylov SN. Smart aptamers facilitate multi-probe affinity analysis of proteins with ultra-wide dynamic range of measured concentrations. J Am Chem Soc 2007;129 (23):7260–1.
References
114. Anderson L, Hunter CL. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol Cell Proteomics 2006;5(4):573–88. 115. Whiteaker JR, Zhao L, Abbatiello SE, Burgess M, Kuhn E, Lin C, et al. Evaluation of large scale quantitative proteomic assay development using peptide affinity-based mass spectrometry. Mol Cell Proteomics 2011;10(4) M110.005645. 116. Kuhn E, Whiteaker JR, Mani DR, Jackson AM, Zhao L, Pope ME, et al. Inter-laboratory evaluation of automated, multiplexed peptide immunoaffinity enrichment coupled to multiple reaction monitoring mass spectrometry for quantifying proteins in plasma. Mol Cell Proteomics 2012;11(6)M111.013854. 117. Schiess R, Wollscheid B, Aebersold R. Targeted proteomic strategy for clinical biomarker discovery. Mol Oncol 2009;3(1):33–44. 118. Ptolemy AS, Rifai N. What is a biomarker? Research investments and lack of clinical integration necessitate a review of biomarker terminology and validation schema. Scand J Clin Lab Invest Suppl 2010;242:6–14. 119. Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, et al. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst 2001;93 (14):1054–61.
37
120. Jenkins M, Flynn A, Smart T, Harbron C, Sabin T, Ratnayake J, et al. A statistician’s perspective on biomarkers in drug development. Pharm Stat 2011;10 (6):494–507. 121. Pencina MJ, D’Agostino RB, Vasan RS. Statistical methods for assessment of added usefulness of new biomarkers. Clin Chem Lab Med 2010;48(12):1703–11. 122. Abraham G, Kowalczyk A, Loi S, Haviv I, Zobel J. Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context. BMC Bioinform 2010;11:277. 123. Ghosh D, Poisson LM. “Omics” data and levels of evidence for biomarker discovery. Genomics 2009;93 (1):13–6. 124. Cramer DW, Bast Jr. RC, Berg CD, Diamandis EP, Godwin AK, Hartge P, et al. Ovarian cancer biomarker performance in prostate, lung, colorectal, and ovarian cancer screening trial specimens. Cancer Prev Res (Phila) 2011;4(3):365–74. 125. Andre F, McShane LM, Michiels S, Ransohoff DF, Altman DG, Reis-Filho JS, et al. Biomarker studies: a call for a comprehensive biomarker study registry. Nat Rev Clin Oncol 2011;8(3):171–6. 126. Mitchell P. Proteomics retrenches. Nat Biotechnol 2010;28(7):665–70.