Proteomics: Methodologies and Applications in Oncology Bradly G. Wouters, PhD Few technological developments have created as much excitement and skepticism as proteomics over their potential to change clinical diagnostic and prognostic procedures. Proteomics concerns itself with the characterization and function of all cellular proteins, the ultimate determinants of cellular function. As such, it represents the end result of all mechanisms of gene regulation and thus offers tremendous potential for characterizing biology. In much the same way as what has occurred with the genome, the scientific community is coming to grips with the fact that the proteome, although enormously complex, is finite. It is conceivable that we will learn the identity of all possible proteins, including all posttranslational modifications. The rate of protein discovery continues to accelerate in large part because of improvements in mass spectrometry– based technologies coupled with improved genomic databases and bioinformatic tools. In addition, there is reason to believe that proteomics is on the verge of moving from a methodology that requires repeated proteome “discovery” to one that can more systematically profile proteomes. This review discusses current proteomic-based technologies and the efforts of scientists to move them into the clinic for use in patients treated with radiotherapy and other modalities. Semin Radiat Oncol 18:115-125 © 2008 Elsevier Inc. All rights reserved.
I
f you ask 10 people to define what “proteomics” embodies, you are likely to receive 15 or so different answers. The term is used to describe a long list of different technologies and strategies designed to investigate the “output” of the genome specifically at the protein level. As such, proteomics concerns itself with the determination, quantification, and function of proteins on a genome-wide scale. But what really does this mean? Proteomics involves in the first place a complete catalog of the proteome, the list of all expressed proteins that derive from the genome. At this level, proteomics is already significantly more complex than the genome (DNA sequence) or transcriptome (transcribed genes). The increase in complexity arises both in terms of the scale of the problem to be addressed and the methodology available for its investigation. The proteome is influenced by alternative messenger
Department of Radiation Oncology (Maastro Lab), GROW Research Institute, Maastricht University, Maastricht, The Netherlands. Supported in part by support from the Dutch Science Organization (ZonMW-NWO Top grant 912-03-047 to BW), the Dutch Cancer Society (KWF grant UM 2003-2821 to BW), and the EU 6th framework program (Euroxy program to BW). Address reprint requests to Bradly G. Wouters, PhD, Department of Radiation Oncology (Maastro Lab), Maastricht University, Universiteitssingel 50, PO Box 616, 6200 MD Maastricht, The Netherlands. E-mail: brad.
[email protected]
1053-4296/08/$-see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.semradonc.2007.10.008
RNA (mRNA) splicing, mRNA stability, protein modifications (eg, phosphorylation, ubiquitination, and glycosylation), and protein stability. Consequently, the number of distinct proteins exceeds the number of genes by at least a factor of 10. These events are responsible for the fact that for many genes there is a poor correlation between mRNA levels and protein abundance. Measurement of the proteome is also significantly more challenging than that of the genome or transcriptome because of the fact that protein levels vary over 10 orders of magnitude coupled with the absence of any “amplifying” technologies akin to polymerase chain reaction for nucleic acids.1 This makes it extremely difficult to “see” proteins that are expressed at low levels, as is the case for some of the most important cellular regulators. At a second level, the term proteomics also refers to efforts aimed at functional characterization of the proteome. This includes studies into pathway activation, protein interactions (the interactome), and the dynamic composition of different protein complexes. At this level, proteomics is well suited to contribute to our basic understanding of cell biology and efforts to integrate large-scale analyses to understand cell and tissue behavior at the system level (systems biology). The payoff for the increase in complexity associated with proteomics is an ability to describe the cell or tissue under investigation at a level that is much more closely related with 115
116 its expected functional characteristics. Although cancer is certainly a “genetic” disease, it is currently difficult or impossible to extrapolate the consequence of any given mutation (or single nucleotide polymorphism [SNP] for that matter) to the functional changes that take place within a given cell, much less within an entire tumor. The degree to which a particular mutation will affect cell proliferation, DNA repair, angiogenesis, hypoxia, metastasis, or other characteristics of relevance to treatment response is clearly not possible to predict at this moment. This is because individual genes are expressed differently in different cell types, are influenced by other gene products and mutations, and may be regulated in complex ways by the cellular microenvironment. A good example comes from the numerous familial tumor suppressor genes that have been identified through extensive genetic studies. For the most part, the identified genes (eg, BRCA1, BRCA2, Rb, VHL, and so on) encode for proteins that are predicted to function in fundamental growth processes in all cell types. However, they each cause cancer in only a limited number of tissues presumably because of tissue-specific differences in gene regulation at both the transcript and protein level as well as the interaction of cells with their microenvironment. Another good example comes from a recently published study describing one of the first results of an ongoing international effort to survey the entire “cancer genome.”2 This study showed that a very large number of different genes contribute to cancer development and that there is relatively little overlap between the genes mutated in tumors from different individuals. This heterogeneity is rather disappointing news to those interested in exploiting cancer-associated mutational events to develop specific therapeutic drugs aimed at exploiting specific defects in individual cancers. Proteomics may offer a solution in these cases because it is clear that the large number of genetic abnormalities found in cancer contribute to a much smaller number of changes in key-signaling pathways that control the cancer cell phenotype.3 Because these pathways are often regulated by changes in protein modification, stability, or interaction, proteomics offers the possibility of identifying and assessing these changes in ways that are not possible with genomic or transcriptomic approaches. This includes the identification of changes in important regulatory proteins that themselves are not even targets for mutation in cancer. A good example is the PI3K/AKT/mTOR pathway that is constitutively activated in many different cancers. Although this can occur as a consequence of many different genetic changes, common downstream effects can be assessed at the protein level. Indeed, a recent study using phosphoprotein pathway mapping showed that Akt/mTOR activation was negatively associated with childhood Rhabdomyosarcoma survival.4
Proteomics Part 1: Protein Discovery The development of proteomics into a science in itself has occurred principally over the past 15 years and has been
B.G. Wouters fueled by the continuous development of new mass spectrometry techniques coupled with advances in genome sequence data. A mass spectrometer consists essentially of 3 components: an ion source to create ionized species, a mass analyzer to measure the mass to charge (m/z) ratio, and a detector to count the number of ions at each m/z value. Mass spectrometry was initially used only for small and thermally stable molecules because techniques to create molecular ions from intact peptides/proteins and other biomolecules were not available. This changed with the development of electrospray ionization (ESI) and matrix-assisted laser desorption/ ionization (MALDI) “soft” ionization techniques that are used to this day.5,6 ESI ionizes the sample out of a solution and is thus the preferred technique for automated approaches after liquid chromatography (LC)-based separation techniques. MALDI uses a laser to sublimate and ionize samples from a defined dry crystalline matrix (eg, liquid chromatography). The mass analyzer is at the heart of the instrument and is responsible for the precise mass measurement of the ionized species. There are 4 main different types of mass analyzer technologies that are used routinely in mass spectrometry.7 These are time of flight (TOF), quadrupole, ion trap, and Fourier transform ion cyclotron resonance, and each have their own unique combination of accuracy, resolution, sensitivity, and dynamic range.7,8 Proteins themselves are typically too large for accurate mass measurement and thus must be broken down into smaller polypeptides before mass spectrometry (Fig 1). This is accomplished using trypsin, which cleaves proteins at arginine and lysine residues creating a number of “tryptic peptides” that are characteristic to the protein. The amino acid composition of these polypeptides (and thus their molecular weight) is often unique to the parental protein and can therefore be used to infer its identity. For a given protein, a singlestage mass spectrometer (usually a MALDI-TOF) will produce a characteristic peptide spectrum, and the measured mass for each of these peptides can potentially be used to infer the identity of the protein. This approach, referred to as peptide mapping or peptide mass fingerprinting, requires comparing the measured mass of the peptide to a list of calculated peptide masses in a comprehensive genomic or protein database. In general, multiple different peptides from the same protein should be identified to place confidence in identification. This approach has become efficient only since the sequencing of the human genome and establishment of large protein databases. For more detailed structural information on individual polypeptides, including the amino acid sequence or the site and type of posttranslational modifications, tandem mass spectrometry (MS/MS)-based instruments are used. These instruments use combinations of 2 mass analyzers and are the mainstay of proteomics research today. The first is used to identify and select a particular ion species (peptide) of interest, which is then subjected to collision-induced dissociation (CID) to generate a series of peptide fragments that are analyzed in a second mass analyzer. The peaks in the resulting MS/MS spectra from these peptide fragments provide additional information on the amino
Proteomics
117
Figure 1 Protein discovery by mass spectrometry. Proteins isolated from cells, tissues, or other sources may be separated (eg, by 2DE) before further analysis. In this case, a small subset of proteins is normally selected for digestion and further analysis. Alternatively, entire proteomes or subproteomes may be collected and digested into smaller peptides. In this case, peptide separation is required by using chromatographic or other techniques before entry into the mass spectrometer. Peptides are ionized by MALDI or ESI techniques and the mass to charge ratio determined by the mass analyzer. This results in a mass spectrum containing a series of peaks whose mass is dictated by the composition of the peptide and may be sufficient for identification. Further characterization of individual peptides is performed by subjecting individual peptides to CID followed by a second MS analysis yielding the MS/MS or tandem mass spectrum. Protein identification in either case requires database searching and comparisons. For further details, see text.
acid sequence and potentially the site and nature of specific posttranslational modifications. At the moment, the CID spectra cannot be directly converted to a full amino acid sequence and thus protein identification with this approach also relies on searching against protein sequence databases using 1 or more algorithms.9 Fortunately, established guidelines and statistical tools to ensure consistency and confidence in the identified proteins are becoming available.10-13 A typical proteomics experiment involves a number of different steps. It often (although not always) begins with some level of protein separation of the proteome of interest into different fractions before mass spectrometry. A number of methodologies can be used here including affinity purification, subcellular fractionation, 1-dimensional or 2-dimensional gel electrophoresis (2DE), and various combinations of liquid chromatography. For many years, 2DE was the method of choice in which protein mixtures were separated in gels on the basis of charge (in the first dimension) and mass (in the second dimension). The gels can be stained with sensitive methods to reveal the location and intensity of individual proteins. Each spot represents a (relatively) pure protein,
and it can be excised and subjected to tryptic digestion and MS to determine its identity. The main advantages of 2DE are in its quantitative nature (see later) and in the fact that it can resolve highly related (eg, modified) proteins. This technique also significantly reduces the technical demand on mass spectrometry because selected proteins spots are already highly purified. However, 2DE suffers from a limited dynamic range (revealing only the most abundant proteins) and slow throughput and is not amenable to analysis of certain classes of proteins. In addition, developments in mass spectrometry instrumentation have increased their ability to resolve complex protein mixtures and to improve throughput. As a result, there has been a steady move away from 2DE for whole proteome and other complex protein mixtures to liquid-based chromatography approaches coupled to MS/MS. In LC-MS/MS-based approaches, entire proteomes or subproteomes are digested first, before any separation technique. This produces a highly complex mixture of peptides that are then separated through (multiple) fractionation steps and analyzed using automated MS/MS analysis (usually ESI) with machines capable of high throughput.7 The goal here is
118
B.G. Wouters
Proteomics
119
to identify all possible peptides within the mixture and then to infer the presence of all original proteins based on the measured MS/MS spectra. Because the connection between the peptides and the proteins from which they were derived is lost, this approach has been termed shotgun proteomics because of its similarity to shotgun DNA sequencing in which short DNA sequences are determined randomly and then reassembled with various algorithms. Shotgun proteomics provides a significant improvement in proteomic coverage but places large demands on data collection and analysis because of the enormous complexity of the peptide samples.
Proteomics Part 2: Proteome Comparison and Profiling Quantification and Comparison The interest in proteomics lies in large part not from simply being able to list the presence of all proteins in the proteome but rather in assessing quantitative differences in protein expression that occur under defined conditions. From a clinical point of view, this could be proteome changes that occur during the course of disease (for early screening), changes that identify subclasses of patients (for prognosis), or changes that reflect treatment response. The ability to quantify changes in the proteome also makes it possible to identify protein subsets that respond to specific cellular situations that are hypothesized to be clinically relevant (eg, hypoxia). However, for this approach to work, it is imperative that one can compare, with a reasonable precision, the changes in protein expression between samples. To directly compare proteomes from different sources, quantification thus becomes imperative. Quantification can be achieved by using 2DE by comparing the intensity of protein spots among different samples before their excision and identification with mass spectrometry. Classically, this involves comparison of separated proteins on different gels and thus requires extensive gel “matching” to ensure that one is comparing levels of the same spot (Fig 2). This is not a trivial matter and currently requires significant computing power, limiting throughput. However, an approach known as difference in gel electrophoresis can also be used in which samples are first labeled with 1 of 2 different fluorescent-
colored dyes. The 2 dyes are designed to result in a similar change in mass and charge and thus will not differentially effect the location of the protein that is labeled. After labeling, the samples can then be run in the same gel, removing the requirement for spot matching and greatly simplifying determination of expression changes, which can be estimated by the difference in color intensity. Gel-free proteomic experiments require different approaches for quantification. Unfortunately, the mass spectrometer itself is not useful for assessing differences in peptide abundance. This is because the relationship between the amount of peptide in the sample and its ability to be ionized and detected in the mass spectrometer is highly complex and incompletely understood. As a result, certain peptides display a much higher frequency of detection and are redundantly found in many different proteomic analyses. These peptides have been referred to as ‘proteotypic’ polypeptides. This leads to a complete loss of information on the abundance of the protein in the original sample. Fortunately, an elegant solution to this problem has been found and is based on peptide labeling with stable isotopes such as 13C, 15N, 18O, or 2H. For experiments conducted in the laboratory, these isotopes can be incorporated into proteins because they are synthesized by the inclusion of modified amino acids in the growth medium. Labeling can also occur through chemical or enzymatic modification of the peptides after trypsin cleavage, and, consequently, it is also possible to use this technique on protein samples isolated from patients. A large number of such isotope-coded affinity tags have been developed and made commercially available, allowing labeling of the vast majority of peptides.14 The isotope-code affinity tag–labeled peptides behave similarly with respect to their chromatographic separation and signal intensity in the mass spectrometer. However, depending on the isotope that is chosen, a small predictable change in peptide mass will occur that is easily detectable by the mass spectrometer. As a result, 2 differentially labeled peptide mixtures (heavy and light) can be combined and then subjected to MS. This produces a MS spectrum with a series of peptide pairs separated by the difference in mass of the isotopes. The ratio of the heavy and light components of the peptide pair provides a direct measurement of the difference in concentration of each within the mixture. For unambiguous identification, the peptide pair can be further subjected to CID
Figure 2 Methodologies for proteome comparisons and profiling. Several approaches are in use for comparing protein expression across entire proteomes. Differences in protein expression can be determined by comparing spot location and intensities after 2DGE. In this case, differentially expressed proteins are identified after excision of the protein spot of interest and MS or MS/MS analysis. High-throughput methods for characterizing large numbers of proteins normally requires LC-based approaches. In this case, proteins from different samples are differentially labeled with isotopes that alter the mass of the peptide in a known way without influencing its other properties. Isotope labeling can occur metabolically in cells by supplying an isotope labeled compound that is incorporated into proteins. Alternatively, peptides can be labeled with an isotope tag of different mass after digestion. The amount of each peptide can be determined by comparing the intensity ratio of peptide pairs, which are separated in mass by an amount determined by the isotope. More recently, peptide-based scoring has been introduced, which uses known peptide “probes” that are isotopically labeled. Mixing these probes with a digested protein sample allows comparatively easy quantification of the amount of each probe target in the sample. For further details, see text.
120 MS/MS. This technique has been used extensively over the past 5 years and has become the method of choice for quantitative proteomics.
Limitations of Current Technologies for Routine Use Despite these impressive developments in proteomic technologies, several limitations preclude their widespread use in experimental medicine, much less routine clinical practice. The first problem is that even with the methodologies outlined earlier, the number of peptides that can be discovered from a complex mixture (eg, an entire digested cellular proteome) is still insufficiently small. A proteome sample may contain well over 100,000 different peptides, which causes problems for even the most modern LC-MS/MS– based machines. As a consequence, too many peptides are presented to the mass spectrometer at the same time, and only a fraction (usually the most abundant) will be selected for CID MS/MSbased identity determination. Furthermore, successful identification from the generated MS/MS spectra occurs only in a fraction of cases. A nice example was published by Li and colleagues15 in a LC-MS/MS experiment in which a complex tryptic digest yielded 2,720 peptides. Of these, 1,633 were selected for CID, and, of these, only 363 yielded unambiguous peptide identification (13%) after database searching of the resulting MS/MS spectra. Together, these factors limit the complexity of the proteome that can be analyzed and lead to significant increases in time and cost for comprehensive coverage. Although improvements in technology continue, the enormous challenge of characterizing and quantifying entire proteomes in this way, especially rare proteins, seems a long way off.
From Protein Discovery to Protein Profiling MS-Based Profiling With Proteotypic Probes It has been strongly argued by Aebersold and colleagues that advancing proteomics to a more routine technology requires a fundamental shift in approach.16 Current approaches are based on rediscovering the entire proteome or, more specifically, the tryptic peptidome during each and every experiment. Every peptide within the digested proteome must be identified, subjected to CID, the MS/MS spectra matched against databases, and the identity of the protein inferred. This is despite the fact that a wealth of valuable peptide data already exist from prior experiments. Aebersold has argued that it is possible to move from this “discovery” approach to one that involves “scoring” the levels of known peptides.16 This shift in approach has occurred for SNP analysis and for transcriptome analysis, in which synthesized probes are now used to detect known SNPs and known expressed genes rather than having to rediscover these through purification, amplification, and sequencing approaches. A similar approach is possible for proteomics, in which one or more unique peptides could be chosen to rep-
B.G. Wouters resent each protein or protein isoform of interest. A MS-based platform could then be designed specifically to evaluate the expression of this representative peptide using peptide probes. The choice for the representative peptide would be based on its ability to be detected in mass spectrometry instruments together with its ability to uniquely identify its cognate protein. An obvious choice is to select the proteotypic peptides (described earlier) that have proven to be readily and consistently detectable in prior MS-based experiments. The proteomic data already available constitute a sizable proportion of the entire proteomic space, and methods to determine proteotypic peptides for unrepresented proteins in the databases are in development.17 Such proteotypic peptides can be used not only to uniquely identify proteins but also specific splice forms, SNPs, or specific protein modifications. Experience has shown that 1 or 2 such peptides are sufficient to identify most proteins, although some may require a few more.17 The effort to catalog such reference peptides for all proteins has been aided by the development of the PeptideAtlas project, which aims to enable comparison of data across different experiments and provide uniform statistical validation.18 The basic idea behind this approach would be to restrict mass spectrometry analysis in complex peptide mixtures to the list of reference peptides. This would be done by combining the tryptic sample digest of interest with a pool of synthesized reference peptides (probes) that have been isotopically labeled to alter their mass by a small predictable amount. This sample could then be fractionated with LC and subjected to MS-based analysis. Such an analysis would yield both single peaks, representing either sample peptides for which no reference peptide is present or reference peptides for which no sample peptide is present, and double peaks in which reference peptides and sample peptides are both present. These double peaks would be easily identifiable because their mass difference is determined by the stable isotope used for labeling the reference peptides. Furthermore, as described previously, the ratio of intensities of the 2 peaks would allow easy quantification of the sample peptide relative to its reference. In this approach, the mass spectrometer is “tuned” to examine only the proteotypic reference peptides, ignoring the vast majority of other peptides and thus enabling much higher sensitivity and coverage of the proteome (dependent really only on the quality and availability of the reference peptides). It has been estimated that a complete proteome could be analyzed in just a few minutes using this technology and currently available mass spectrometers.16 Antibody Arrays Although MS-based approaches continue to develop, intermediate technologies that have grown out of more standard methodologies have developed to keep pace with the genomics revolution. This includes the development of antibodybased microarrays, a technology that is essentially a highthroughput extension of single biomarker studies that can be used to simultaneously analyze a large number of proteins in parallel. This technology relies heavily on the availability of good detection reagents that have a high affinity and sensi-
Proteomics tivity for each protein of interest, which include antibodies, aptamers, peptides, or phage lysates. Protein microarrays may be either forward phase in which multiple “bait” molecules are deposited onto an array to “capture” specific targets19 or reverse-phase in which small volumes of the sample itself is immobilized and then probed with specific antibodies.20 Forward-phase arrays have been used in 2 different formats. In the first, an approach similar to that used for complementary DNA microarrays is used in which proteins from a test and reference sample are labeled with different fluorophores (eg, Cy3 and Cy5) and then allowed to bind to the bait proteins on the array. In practice, this does not work well because labeling the proteins interferes with binding in unknown ways and because labeling of highly abundant proteins (eg, albumin in serum) can lead to extensive background because of nonspecific binding. Instead, a dual-antibody approach is often used, in which captured proteins are detected using a second antibody. These are usually tagged with biotin and then detected with an antibiotin antibody. This approach also suffers from a number of important drawbacks. First, the requirement for 2 good antibodies to distinct epitopes (or other detection reagents) limits the usefulness of this technique to a small subset of proteins. Second, the simultaneous incubation of multiple antibodies increases cross-reactivity and nonspecific binding. Third, the affinity of different antibodies varies over many orders of magnitude, and it is thus difficult to conduct experiments in which all targets are analyzed within the linear range. This leads to large degrees of variability across platforms and experiments and, in general, limits reproducibility and the ability to compare expression across different samples. These limitations currently prevent any realistic hope of using these types of arrays for routine patient profiling, but they have found a place in biomarker discovery experiments. Reverse-phase arrays (RPAs) are also often referred to as a proteomic technique, although they are really just a highthroughput single-protein marker approach. In much the same way that tissue microarrays extend the throughput of immunohistochemistry, RPAs can extend the throughput of more traditional Western blot analysis. In this type of experiment, a small volume of the entire proteome sample is deposited at different dilutions (eg, from 1:2 to 1:16) onto an array together with many other samples (up to ⬃50). These multiple samples can then be probed with a single antibody to establish the expression pattern in all 50 samples at the same time. The ability to analyze small volumes and thus small concentrations of proteins is not a trivial one and is particularly challenging given the lack of any amplification ability for proteins. With RPA, it is possible to use amplification strategies at the level of detection using techniques similar to that used routinely in immunohistochemistry. The dilution series ensures that measurement can take place within a linear range. Thus, RPA allows surveying the expression of specific proteins or phosphoproteins for which good antibodies exist but does not approach the comprehensive proteome analysis described previously for mass spectrometry. This approach is well suited to probe the activation or
121 expression of proteins within specific pathways and has been used successfully in a number of cancer research investigations. These techniques may also be particularly valuable in clinical trials using drugs aimed at targeting specific pathways.
How Can Proteomics Be Used Clinically? Proteomic techniques have already been used in the clinic in a number of different ways, and these studies have created a great deal of both excitement and controversy. There are 2 principal ways that proteomics can contribute in the clinic. The first is as a technique to discover novel biomarkers or sets of biomarkers that may be unique to a particular class of disease. Such biomarkers might be used to identify the presence of the disease and thus prove to be useful in screening or treatment evaluation. However, they may also identify specific pathways or responses that are activated within tumors during carcinogenesis or in response to the unique microenvironment that tumor cells find themselves in. Uniquely expressed proteins may also be good candidates for drug development. Because new biological agents often have proteins as their direct targets, such as the small molecule kinase inhibitors or monoclonal antibodies that have recently entered the clinic, proteomics offers an attractive platform for their discovery. This approach was used successfully to identify a series of endothelial cell-surface proteins expressed exclusively in specific organs or tumors.21 This was accomplished by using 2DE-based separation and MS. Antibodies were subsequently developed to 2 of these proteins and used to target therapy to the vasculature in tumors. This approach has also been used to investigate proteins that may mediate differences in response to radiotherapy. Allal et al22 used 2DE and MALDI-TOF to compare proteins in biopsies from 17 patients before radiotherapy. Comparison of the radiationresistant and radiation-sensitive tumors yielded a small list of proteins associated with each of these responses. There is less concern associated with the small size of this study or the limited scope of investigations in general because these proteins are essentially only candidates that require further investigation in much larger clinical samples. In these types of experiments, proteomics is applied as a technique simply to survey across the entire proteome for candidate proteins that can then be further investigated with other approaches. Consequently, these types of experiments place fewer demands on reproducibility, feasibility, and throughput than would be required for routine use in the clinic. This has allowed less reproducible and comprehensive techniques such as forward-phase antibody arrays to be used as well. This technique has been applied to patient serum and successfully identified novel circulating tumor markers in a number of cases.23-25 This type of approach can also be exploited using laboratory-based experiments to search for proteins that may reflect specific phenotypes hypothesized to be important for therapy response. For example, extensive efforts have been made
122 to discover predictive markers for hypoxia, proliferation, and radiosensitivity because of their perceived importance in determining radiotherapy response. There is reason to believe that various “omic” approaches have a role to play here (also discussed in other articles in this issue). This is exemplified by a recent publication showing that hypoxia-responsive genes identified in vitro, the so-called hypoxic signature, provide valuable prognostic information26 in clinical datasets. This is impressive given the poor relationship between mRNA and protein levels and suggests that similar hypothesis driven experiments using proteomic analyses will also have clinical value. In fact, several studies have already begun to investigate proteomic changes to the classic modulators of tumor response including hypoxia, proliferation, and radiation sensitivity.27-30 The hope is that these protein-based signatures may also show some utility for predicting response in the clinic. The second way in which proteomics can be used clinically relates more directly with the subject of this series of articles (screening, prognostic, or predictive profiling). This becomes an extension of current biomarker and immunohistochemical procedures that have proven useful in pathological examinations such as prostate-specific antigen in prostate cancer and the estrogen receptor in breast cancer. Extending this characterization to 10, 100, or 1,000’s of proteins (perhaps all proteins one day) certainly offers exciting potential for this technology. The expectation is that by assessing the proteomic profile of an individual tumor, it should be possible to predict the present state of disease, the prognosis, or the expected response to specific types of therapy. As outlined in other papers within this series, this approach has already proven to be quite useful with techniques for mRNA expression profiling. With all of the associated benefits of measuring proteins outlined above, one would expect that proteomics data should be able to perform even better. The rapid and steady evolution of proteomic technologies has resulted in a rapid rate of obsolescence, and this has made routine use of any of these technologies difficult. However, despite some serious limitations associated with their immaturity, a number of investigations have reported promise for proteomic approaches in the clinic. Some examples of these are highlighted below.
Serum-Based Clinical Proteomics An additional advantage of proteomics-based profiling over expression microarrays that has not been mentioned so far is that informative proteins are available from several different biological sources. There are obvious reasons to think that proteins derived from tumors themselves would be useful, but it is also possible to survey proteins in various biological fluids especially blood and urine when protein-based assays are routinely performed. In fact, blood-based proteomics has arguably been the most thoroughly investigated to date. The blood contains a number of endogenous proteins, such as albumin, as well as many other low–molecular-weight (LMW) pro-
B.G. Wouters tein products that are collected during circulation. This includes small proteins that may be secreted by various tissues, including tumors, and protein fragments of larger proteins that can be derived from either secreted or intracellular proteins. The LMW range of the serum proteome (⬍50,000 D) has been referred to as the peptidome, and analysis of this particular component has attracted both enthusiasts and skeptics.31 In 2002, Liotta and Petricoin used a MS-based platform known as surface-enhanced laser desorption ionization (SELDI) to investigate the peptidome of patients with ovarian cancer.32 The SELDI platform makes use of affinity chips to capture proteins onto a matrix, which can be used as a source for ionization and MS-based peptide analysis. Using this technique, they generated a series of LMW peptide spectra from the serum of a group of patients with ovarian cancer and compared this with a series of controls. Because the original SELDI machine had a relatively poor resolution and lacked MS/MS capabilities, it was not possible to identify the individual peptides within the spectra. Nonetheless, using pattern matching and other sophisticated bioinformatic techniques, they were able to define a selected series of peptide peaks or “signature” that could uniquely identify the patients with ovarian cancer with extremely high specificity and sensitivity. Based on these experiments, these authors concluded that the blood can indeed be used as a rich source of diseasespecific information. Shortly thereafter, this group and several others reported similar results in a wide range of cancers using the same approach.33-37 However, these results also generated a great deal of controversy because others have argued that the LMW portion of the serum proteome is present at concentrations too low to be relevant and that it may only represent biological breakdown products or simply just noise in these experiments.38,39 The LMW portion of the serum is also extremely sensitive to sample collection, storage, and processing because activation of proteases during and after collection is a frequent event. In the high–molecular-weight part of the serum, this can cause problems because of the loss of signal and reproducibility. This instability causes an even greater effect on the LMW component of the serum because these events contribute to new peptides not present in the original sampled blood. Normally, this would suggest that care must be taken to prevent protein degradation during blood collection, handling, and storage. However, a recent article has suggested that the fragments generated after blood collection may be responsible for the useful information within the LMW serum proteome, and that one should therefore not add protein-stabilizing components to these samples.40 Nonetheless, this instability places a strict requirement for consistent handling of material to be analyzed with these techniques. Apprehension over the value of serum proteomics was fueled by studies that began to identify some of the LMW peptide peaks within the key peptide signatures described in the earlier studies. It was discovered that the signatures consisted largely of peptides derived from highly common and abundant proteins endogenous to the blood like transthyretin as well as breakdown products of some low abundance intracellular proteins from tissues such as BRCA2.41 It was
Proteomics difficult to understand why the informative peptide signatures would consist of non–tumor-derived proteins or nonsecreted proteins. Liotta and Petricoin now hypothesize that the protein fragments themselves provide unique information that is distinct from the parental proteins from which they derive.42 They argue that the presence, abundance, and nature of these breakdown products are determined in part from activation of enzymes within the cancer microenvironment or secreted into the bloodstream and, as such, provide novel disease-specific information. In other words, these protein fragments reveal an upstream enzymatic activity pattern that is unique to the tumor and that can thus be of clinical value. Because the vasculature within tumors is often compromised and leaky, it has been assumed that tumor-derived proteins (including proteases) may enter the bloodstream much more easily and frequently. The fact that tumors often contain large numbers of dead or dying cells also increases the potential for tumor-derived proteins or fragments to end up in the blood. The fragments that are ultimately detected may thus represent breakdown products from the tumor cells, the tumor stromal, or even blood-based proteins. This hypothesis would also provide an answer to why protein fragments generated after collection could be clinically informative because these may derive from the activity of tumorderived enzymes. Another argument often made against using serum-based proteomics as a technique to identify tumor-specific proteins is with respect to the dilution effect that occurs as these products enter the bloodstream. The secreted proteins with potential clinical value are diluted within 5 L of blood and are thus present at concentrations estimated to be approximately 1 billion times lower than serum proteins such as albumin.38 This explains why current detection of blood biomarkers requires highly sensitive enzymatic-based (enzyme-linked immunosorbent assay) assays and also perhaps why known tumor markers like prostate-specific antigen do not seem to turn up in these experiments. Furthermore, the LMW portion of the blood proteome is rapidly filtered out by the kidneys, and, thus, tumor-derived products are unlikely to accumulate over time. In fact, it has been estimated that tumor markers in the blood would be present at concentrations almost 1,000 times lower than the limit of sensitivity of the SELDI instrument that was used in most clinical experiments to identify tumor specific markers.38 How then, can the impressive results reported with this and similar instruments be explained? One potential explanation that has been recently recognized is that peptide fragments can be “amplified” by binding to proteins such as albumin.31,41 This would protect these proteins from clearance in the kidney and extend their half-life and concentration to a point in which they become detectable. This has led to new approaches using purification techniques in which albumin (or other blood proteins) is purified specifically to harvest bound peptides from the serum. In summary, the field of serum proteomics continues to move through periods of irrational exuberance followed by critical self re-examination. The problems and controversies associated with this approach are to a large extent caused by
123 the immaturity of the technologies and their introduction into the clinic before a complete understanding of their strengths and weaknesses. Investigators such as Liotta and Petricoin deserve a great deal of credit for their efforts both to push this technology into the clinical realm as well as to try to understand what the clinical data have been telling us. It is clear that a number of questions remain with respect to what proteins/peptides are available in the serum and what types of technology will be appropriate for measuring it.42 One of the general agreed on points, however, is that it is imperative to be able to identify the peptides within samples and not simply rely on peptide patterns to be able to independently verify their prognostic value.
Tumor-Based Clinical Proteomics In comparison to the serum, proteomics performed on tumor tissue is somewhat more straightforward because one can focus specifically on the content of the tumor cells themselves. However, because tumors are characterized by both tumor cells and a range of host-derived cells, careful consideration must be made with respect to the starting material for comparison-based proteomic profiling. One approach is to use laser-capture microdissection to identify the areas within the tumor of interest and then to isolate the proteins specifically from this area before proteomic analysis with MS. This approach was first used to compare early- and late-stage disease to identify proteins associated with tumor progression using the SELDI machine.43 Although very few cells are required (as few as 25 will create a suitable MS peptide spectra), this is a rather tedious procedure and not easily amenable to high-throughput approaches. Consequently, there have been significantly fewer clinical studies using this approach compared with serum-based proteomics. A second approach that has been developed is known as direct tissue or imaging MS. With this technique, a small amount of MALDI matrix is deposited and dried directly onto a region of interest within a fresh frozen tumor section.44 Peptides are ionized only from within the area in which the MALDI matrix is present, ensuring that the MS peptide spectra will represent proteins derived from the chosen area. In effect, this allows one to use MS in an “imaging” mode as spatial information is preserved. New techniques allow for deposition of very small amounts of MALDI matrix using ink-jet technology,45 and these can be deposited in a grid (or any pattern for that matter) that covers entire tumor sections. These techniques can even be used to map proteins or small molecules in a 3-dimensional view by scanning a complete series of sections. In an impressive study, this technique was applied in non–small-cell lung cancer in a training cohort of 42 lung tumors and 8 normal lung tissues.46 Proteomic peptide patterns were able to classify lung cancer histologies, classify nodal involvement with 85% accuracy, and distinguish patients with good and poor prognosis.
124
Challenges and Outlook This review has focused on the technological developments of mass spectrometry– based proteomics and how these techniques are and can be used in the clinic. The technology certainly continues to lag behind that of other genomic approaches such as microarrays and SNP analysis with respect to its applicability to clinical use. However, because this technology matures, it continues to provide promise for breakthroughs in clinical diagnosis and prognosis and will undoubtedly find a place in clinical medicine in the not too distant future. It would be wise to anticipate this development and to begin to prepare now for making clinical proteomics experiments possible. Similar to the situation with microarrays, one can anticipate that the bottleneck for exploitation of proteomics technology will move quickly from a technological one to a clinical one. Specifically, investigators will need high-quality tumor or other biological starting material from large and well-controlled clinical studies. This highlights the importance of blood and tumor banking with attention paid to protocols that will facilitate proteomics experiments in the future. Although no current agreed on standard exists today, many recommendations are available on the proper handling and storage of various tissues.47-51 The most important parameter at the moment is to ensure that all samples are handled in an identical way, with a minimum of variation in time between collection and storage. The lessons learned from the limited number of clinical studies performed to date highlight the importance of such strict protocols for sample collection, processing, and storage. Another important issue that has not been considered at all in this review is that of data analysis. Proteomics experiments generate enormous amounts of data, and this has the potential to further increase in the near future. This leads to the common problems of overfitting data, a problem that plagues many genomic approaches. Fortunately, proteomic analyses will be able to greatly benefit from the bioinformatic tools that have been developed in concert with the implementation of microarray and SNP approaches in the clinic. Proteomic data also generate their own unique sets of challenges, and it is clear that a close working relationship between bioinformaticians, mass spectrometry experts, and clinicians will be required for successful integration of proteomics into clinical use.
Acknowledgments I would like to thank all members of our team at Maastro and, in particular, Marianne Koritzinsky for critically reading the manuscript.
References 1. Anderson NL, Anderson NG: The human plasma proteome: History, character, and diagnostic prospects. Mol Cell Proteomics 1:845-867, 2002 2. Sjoblom T, Jones S, Wood LD, et al: The consensus coding sequences of human breast and colorectal cancers. Science 314:268-274, 2006 3. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 100:57-70, 2000
B.G. Wouters 4. Petricoin EF, Espina V, Araujo RP, et al: Phosphoprotein pathway mapping: akt/mammalian target of rapamycin activation is negatively associated with childhood rhabdomyosarcoma survival. Cancer Res 67: 3431-3440, 2007 5. Fenn JB, Mann M, Meng CK, et al: Electrospray ionization for mass spectrometry of large biomolecules. Science 246:64-71, 1989 6. Karas M, Hillenkamp F: Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem 60:22992301, 1988 7. Aebersold R, Mann M: Mass spectrometry-based proteomics. Nature 422:198-207, 2003 8. Domon B, Aebersold R: Mass spectrometry and protein analysis. Science 312:212-217, 2006 9. Baldwin MA: Protein identification by mass spectrometry: issues to be considered. Mol Cell Proteomics 3:1-9, 2004 10. Carr S, Aebersold R, Baldwin M, et al: The need for guidelines in publication of peptide and protein identification data: Working Group on Publication Guidelines for Peptide and Protein Identification Data. Mol Cell Proteomics 3:531-533, 2004 11. Keller A, Nesvizhskii AI, Kolker E, et al: Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74:5383-5392, 2002 12. Lam H, Deutsch EW, Eddes JS, et al: Development and validation of a spectral library searching method for peptide identification from MS/ MS. Proteomics 7:655-667, 2007 13. Domon B, Aebersold R: Challenges and opportunities in proteomics data analysis. Mol Cell Proteomics 5:1921-1926, 2006 14. Shiio Y, Aebersold R: Quantitative proteome analysis using isotopecoded affinity tags and mass spectrometry. Nat Protoc 1:139-145, 2006 15. Li XJ, Pedrioli PG, Eng J, et al: A tool to visualize and evaluate data obtained by liquid chromatography-electrospray ionization-mass spectrometry. Anal Chem 76:3856-3860, 2004 16. Kuster B, Schirle M, Mallick P, et al: Scoring proteomes with proteotypic peptide probes. Nat Rev Mol Cell Biol 6:577-583, 2005 17. Mallick P, Schirle M, Chen SS, et al: Computational prediction of proteotypic peptides for quantitative proteomics. Nat Biotechnol 25:125131, 2007 18. Desiere F, Deutsch EW, King NL, et al: The PeptideAtlas project. Nucleic Acids Res 34(Database issue):D655-6558, 2006 19. Haab BB: Methods and applications of antibody microarrays in cancer research. Proteomics 3:2116-2122, 2003 20. Liotta LA, Espina V, Mehta AI, et al: Protein microarrays: Meeting analytical challenges for clinical applications. Cancer Cell 3:317-325, 2003 21. Oh P, Li Y, Yu J, et al: Subtractive proteomic mapping of the endothelial surface in lung and solid tumours for tissue-specific therapy. Nature 429:629-635, 2004 22. Allal AS, Kahne T, Reverdin AK, et al: Radioresistance-related proteins in rectal cancer. Proteomics 4:2261-2269, 2004 23. Hamelinck D, Zhou H, Li L, et al: Optimized normalization for antibody microarrays and application to serum-protein profiling. Mol Cell Proteomics 4:773-784, 2005 24. Miller JC, Zhou H, Kwekel J, et al: Antibody microarray profiling of human prostate cancer sera: Aantibody screening and identification of potential biomarkers. Proteomics 3:56-63, 2003 25. Zhou H, Bouwman K, Schotanus M, et al: Two-color, rolling-circle amplification on antibody microarrays for sensitive, multiplexed serum-protein measurements. Genome Biol 5:R28, 2004 26. Chi JT, Wang Z, Nuyten DS, et al: Gene expression programs in response to hypoxia: Cell type specificity and prognostic significance in human cancers. PLoS Med 3:e47, 2006 27. Koritzinsky M, Seigneuric R, Magagnin MG, et al: The hypoxic proteome is influenced by gene-specific changes in mRNA translation. Radiother Oncol 76:177-186, 2005 28. Han YH, Xia L, Song LP, et al: Comparative proteomic analysis of hypoxia-treated and untreated human leukemic U937 cells. Proteomics 6:3262-3274, 2006 29. Yang F, Stenoien DL, Strittmatter EF, et al: Phosphoproteome profiling of human skin fibroblast cells in response to low- and high-dose irradiation. J Proteome Res 5:1252-1260, 2006
Proteomics 30. Flory MR, Lee H, Bonneau R, et al: Quantitative proteomic analysis of the budding yeast cell cycle using acid-cleavable isotope-coded affinity tag reagents. Proteomics 6:6146-6157, 2006 31. Liotta LA, Petricoin EF: Serum peptidome for cancer detection: Spinning biologic trash into diagnostic gold. J Clin Invest 116:26-30, 2006 32. Petricoin EF, Ardekani AM, Hitt BA, et al: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572-577, 2002 33. Petricoin EF 3rd, Ornstein DK, Paweletz CP, et al: Serum proteomic patterns for detection of prostate cancer. J Natl Cancer Inst 94:15761578, 2002 34. Ornstein DK, Rayford W, Fusaro VA, et al: Serum proteomic profiling can discriminate prostate cancer from benign prostates in men with total prostate specific antigen levels between 2.5 and 15.0 ng/ml. J Urol 172:1302-1305, 2004 35. Adam BL, Qu Y, Davis JW, et al: Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res 62:3609-3614, 2002 36. Li J, Zhang Z, Rosenzweig J, et al: Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem 48:1296-1304, 2002 37. Ebert MP, Meuer J, Wiemer JC, et al: Identification of gastric cancer patients by serum protein profiling. J Proteome Res 3:1261-1266, 2004 38. Diamandis EP: Point: Proteomic patterns in biological fluids: Do they represent the future of cancer diagnostics? Clin Chem 49:1272-1275, 2003 39. Sorace JM, Zhan M: A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4:24, 2003 40. Villanueva J, Shaffer DR, Philip J, et al: Differential exoprotease activities confer tumor-specific serum peptidome patterns. J Clin Invest 116: 271-284, 2006 41. Lowenthal MS, Mehta AI, Frogale K, et al: Analysis of albumin-associated peptides and proteins from ovarian cancer patients. Clin Chem 51:1933-1945, 2005
125 42. Petricoin EF, Belluco C, Araujo RP, et al: The blood peptidome: A higher dimension of information content for cancer biomarker discovery. Nat Rev Cancer 6:961-967, 2006 43. Simone NL, Paweletz CP, Charboneau L, et al: Laser capture microdissection: Beyond functional genomics to proteomics. Mol Diagn 5:301307, 2000 44. Stoeckli M, Chaurand P, Hallahan DE, et al: Imaging mass spectrometry: A new technology for the analysis of protein expression in mammalian tissues. Nat Med 7:493-496, 2001 45. Meier MA, de Gans BJ, van den Berg AM, et al: Automated multiplelayer spotting for matrix-assisted laser desorption/ionization time-offlight mass spectrometry of synthetic polymers utilizing ink-jet printing technology. Rapid Commun Mass Spectrom 17:2349-2353, 2003 46. Yanagisawa K, Shyr Y, Xu BJ, et al: Proteomic patterns of tumour subsets in non-small-cell lung cancer. Lancet 362:433-439, 2003 47. Rai AJ, Gelfand CA, Haywood BC, et al: HUPO Plasma Proteome Project specimen collection and handling: Towards the standardization of parameters for plasma proteome samples. Proteomics 5:3262-3277, 2005 48. Rai AJ, Vitzthum F: Effects of preanalytical variables on peptide and protein measurements in human serum and plasma: implications for clinical proteomics. Expert Rev Proteomics 3:409-426, 2006 49. Timms JF, Arslan-Low E, Gentry-Maharaj A, et al: Preanalytic influence of sample handling on SELDI-TOF serum protein profiles. Clin Chem 53:645-656, 2007 50. West-Nielsen M, Hogdall EV, Marchiori E, et al: Sample handling for mass spectrometric proteomic investigations of human sera. Anal Chem 77:5114-5123, 2005 51. West-Norager M, Kelstrup CD, Schou C, et al: Unravelling in vitro variables of major importance for the outcome of mass spectrometrybased serum proteomics. J Chromatogr B Analyt Technol Biomed Life Sci 847:30-37, 2007