Drug Discovery Today: Technologies
Vol. 3, No. 4 2006
Editors-in-Chief Kelvin Lam – Pfizer, Inc., USA Henk Timmerman – Vrije Universiteit, The Netherlands DRUG DISCOVERY
TODAY
TECHNOLOGIES
Techniques for rational design
Proteomic approaches in drug discovery Timothy D. Veenstra Laboratory of Proteomics and Analytical Technologies, SAIC-Frederick Inc., NCI-Frederick, P.O. Box B, Frederick, MD 21702, USA
To find a new drug against a chosen target usually involves high-throughput screening, wherein large libraries of chemicals are tested to determine their
Section Editor: Hugo Kubinyi – University of Heidelberg, Heidelberg, Germany
ability to modify the target. Before a target can be chosen, however, it must first be discovered. The omics era has brought unprecedented abilities to screen cells at the gene, transcript, protein, and metabolite level in search of novel drug targets. Of the big four classes of biomolecules, proteins remain the principal target of drug discovery. The recent developments in proteomic technologies have brought with them ability to comparatively screen large numbers of proteins within clinically distinct samples. This capability has enabled non-biased studies in which the goal is to discover proteins that may act as suitable diagnostic biomarkers or therapeutic drug targets. Although proteomics technology has brought with it much hope, there are still many challenges associated with leveraging the experimental data into the discovery of novel drug targets. Introduction Drug discovery can be defined as a research process that identifies and develops a molecule that produces a desired effect in a living organism. Although the human cell is made up of a large number of genes, transcripts, proteins, and metabolites, most often a drug is designed to act upon a E-mail address: T.D. Veenstra (
[email protected]) 1740-6749/$ ß 2006 Elsevier Ltd. All rights reserved.
DOI: 10.1016/j.ddtec.2006.10.001
protein [1]. Although on the surface the process may seem straightforward – find a deranged protein that is causing an adverse affect and then use a molecule to block its effect – there are challenges, both technical and physiological, that makes drug discovery a daunting challenge. The first challenge is to find the protein target. Although this article will not discuss this issue at length, the initial need before any instrumental analysis can be implemented, is the selection of suitable samples that are to be used in the discovery of the target. Fundamentally, the sample set should include materials acquired from patients who are affected by a specific disorder and those acquired from healthy, matched controls. Although human samples will be necessary at some point if a drug is to be approved for human use, drug discovery can often begin with much easier to obtain and manipulate samples such as cell culture or a suitable animal model. Although the efficacy of a drug in a non-human system is often a poor predictor of its efficacy in a human, issues such as husbandry and genetic background can be controlled in non-human models. Once a suitable model has been found, the next step is to identify the deranged protein(s) that is (are) responsible for the adverse condition being studied. This step is where the technology developments made in surveying the protein content of cells, tissues, and organisms has changed the design of drug discovery studies. In the past (and to a large extent currently), protein science was dominated by hypothesis-driven studies in which a specific or small number of 433
Drug Discovery Today: Technologies | Techniques for rational design
proteins are studied to determine if they play a role in a particular cell phenotype. Today’s technologies allow discovery-driven studies in which the aim is to gather as much possible information on as many proteins as possible to determine which proteins are contributing to the observed phenotype. As will be discussed later, the ability to gather more information at the protein level would seem to simplify the problem and enable the identification of large numbers of novel drug targets; it has also resulted in a whole new set of questions that need to be considered and new hurdles that need to be cleared.
Proteomic technologies for the discovery of drug targets The mention of proteomics most often invokes images of two-dimensional gels and mass spectrometers. If a nonbiased approach is to be taken, the attributes of mass spectrometry (MS) make it arguably the most powerful technology for the discovery of protein drug targets. Although both two-dimensional gels and MS play a major role in proteomics, they are not the only technologies available, or necessary, for the discovery of drug targets (Fig. 1). The successful discovery of drug targets relies on a variety of techniques such as the appropriate sample preparation, fractionation, protein mea-
Vol. 3, No. 4 2006
surement, and bioinformatics. Although much of the credit for the ability to characterize proteomes to the extent possible today is a direct result in the development of more powerful mass spectrometers, the contributions of sample preparation and protein fractionation should not be overlooked. After the clinical sample set has been acquired, the design of the sample preparation steps that will be used is probably the most critical step that will determine success or failure. The sample preparation steps need to be designed depending on the level of information one has concerning the possible drug target. For example, if there is evidence, empirical or otherwise, that the target is a membrane receptor, ultracentrifugation should be incorporated into the sample preparation steps to isolate membranes from the samples (if possible). In the cases of serum and plasma, it is wise to remove high abundance proteins, such as albumin and immunoglobulins, because they can interfere with downstream analyses [2]. Unfortunately, in too many cases very little is known about the potential drug target. In this case, the aim is to characterize as many proteins with the sample as possible. The next decision point entails what type of separation is best for the samples of interest. Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) has been widely used in comparing proteomes extracted from comparative samples
Figure 1. A partial view of various proteomic technologies important in drug discovery.
434
www.drugdiscoverytoday.com
Vol. 3, No. 4 2006
Drug Discovery Today: Technologies | Techniques for rational design
Figure 2. High-throughput peptide identification using liquid chromatography (LC) coupled on-line with tandem mass spectrometry (MS). The mass spectrometer takes an MS scan and measures the intensity of various peptide ions observed temporally during a separation of a complex peptide mixture (a and b). The most abundant peptide ion (c) is isolated and subjected to collisional induced dissociation (d). The resulting tandem MS spectrum is analyzed by the appropriate software to identify the peptide sequence that would most probably give rise to this fragmentation pattern (e). This peptide sequence is then correlated back to its protein of origin. Modern mass spectrometers conduct steps (b) through (d) in a rapid cyclical fashion enabling hundreds of peptides within complex mixtures to be identified per hour.
[3]. In 2D-PAGE, samples are fractionated based on their isoelectric point (pI) and molecular mass. After staining of the proteins, spots that are more or less intense within the comparative samples are excised from the gel and identified. Two-dimensional PAGE enables the relative abundances of proteins from different samples to be compared within the gel on the basis of intensity of the protein staining. Mass spectrometry is the tool of choice for protein identification because of its throughput, sensitivity, and ability to identify proteins based on sequence-related information [4]. Another approach that is commonly used when the goal is to characterize large numbers of proteins is to circumvent 2DPAGE and directly analyze the samples by MS [5]. One of the misnomers of this type of MS-based proteomics is that in most studies they are peptides, rather than proteins, that are characterized. In these bottom-up studies, the entire proteome is digested into tryptic peptides that are introduced into the mass spectrometer for identification. The digestion of potentially thousands of proteins results in potentially tens to hundreds of thousands of peptides. Therefore, it is necessary to fractionate this mixture before MS analysis. The most
commonly used prefractionation technique is strong cation exchange (SCX) followed by reversed-phase liquid chromatography. This combination can either be done online together using a bi-phasic column or offline in which fractions are collected from the SCX column. The reversedphased separation is always done directly on-line so that peptide elute directly from this column into the mass spectrometer. The ability of the mass spectrometer to identify proteins rapidly is arguably the parameter that makes this instrumentation the driving force in proteomics today. How exactly does a mass spectrometer identify peptides? As shown in Fig. 2, peptides are being constantly eluted from a reversed-phase column into the mass spectrometer (Fig. 2a). During this separation, the instrument records the mass-to-charge (m/z) ratios of the peptides that are eluting at a specific time point (Fig. 2b). The instrument then selects and isolates the most intense ion observed in the previous scan (Fig. 2c) and fragments it, in a process referred to as tandem MS, to create a series of sequence ladders (Fig. 2d). After this fragmentation event the instrument www.drugdiscoverytoday.com
435
Drug Discovery Today: Technologies | Techniques for rational design
proceeds to isolate and fragment the next most abundant peptide ion. It does this sequential ion selection and fragmentation for anywhere from the 3–10 most abundant peptide ions (depending on the operator setting). Today’s mass spectrometers are able to collect approximately 7000 tandem mass spectra in a single hour. All of these spectra are then analyzed using the appropriate software and protein or genome database to identify the peptides that gave rise to the individual spectra (Fig. 2e). In a typical analysis, 10–20% of the spectra will give a hit, allowing between 700 and 1400 peptides to be identified confidently and then correlated to its protein of origin.
Detection of changes in protein abundance There are many attributes that can make a protein a potential drug target. Protein phosphorylation, which controls many aspects of cell physiology, is an important target for drug design. For example, Gleevec works by targeting a constitutively active tyrosine kinase, BCR-Abl and shutting off the uncontrolled cell growth associated with the mutated gene product. G-protein-coupled receptors have historically been the most important group of drug targets. Not surprisingly, protein kinases are now the second most important class of drug target [6]. Other modifications on proteins such as
Vol. 3, No. 4 2006
prenylation [7], methylation [8], sulfation [9], among others have also been targeted as potential drug targets. Although MS-based proteomics is capable of detecting such modifications, outside of phosphorylation, the science is not mature enough to identify such changes on the scale and with the reliability necessary for drug target discovery. The major focus in MS-based proteomics is to identify changes in the relative abundances of proteins between comparative sample sets. As mentioned above, 2D-PAGE provides direct measurement of changes in protein relative abundances courtesy of the protein staining intensity. In non-gel based approaches other means must be used to identify those proteins that are differentially abundant between the sample cohorts. There are essentially two main strategies used to gain a measure of a protein’s relative abundance in different proteome samples: differential stable isotope labeling [10] and subtractive proteomics (Fig. 3) [11]. There are many different methods in which differential stable isotope labeling is used in quantitative proteomics, however, they all have the same basic premise; label amino acids within one proteome with a light isotope of a common element (e.g. 12C, 14N, and so forth) and label the other proteome with the matching heavy isotope (e.g. 13C, 15N). This labeling can be done either chemically
Figure 3. Quantitative methods for comparative proteomics. (a) Stable isotope labeling and (b) subtractive proteomics.
436
www.drugdiscoverytoday.com
Vol. 3, No. 4 2006
(e.g. in the case of isotope-coded affinity tags or iTRAQ), or metabolically (e.g. culturing of cells in medium enriched for a particular heavy stable isotope). Although there are subtle differences in the sample processing steps depending on the type of stable isotope labeling approach used, in either case the differentially labeled proteome samples are combined and digested into tryptic peptides. The peptide mixture is then analyzed through a combination of multidimensional chromatography coupled directly on-line with data-dependent tandem MS, as shown previously in Fig. 2. The relative abundance of the peptides within the different samples is measured in the MS scan, and MS/MS is used for identification. The result is a list of the relative abundances of proteins among samples being compared. The hope is that a protein(s) that has an observable abundance difference between two (or among more) sets of samples is an intriguing candidate as a potential drug target and can be graduated to further validation and future clinical development. Although stable isotope labeling methods enable the quantitation of thousands of proteins in complex clinical samples, they are low throughput, requiring days to compare even a couple of samples. They are generally limited to the comparison of no more than four samples, and metabolic stable isotope labeling methods are not applicable to the study of human samples. Although they have made a major impact in the analysis of cellular and tissue proteomes, stable isotope labeling methods, have not been widely used in the study of biofluids. Although the reasons for this are not readily obvious, it is possible that the domination of serum and plasma by a few high abundant proteins impacts the ability to modify lower abundant proteins chemically. Subtractive proteomic approaches have been recently developed to simplify and increase the throughput of analyzing clinically important samples [11]. These methods do not rely on gels or stable isotopes, but quantitate proteins based on the number of peptides identified for each species (Fig. 3b). In this method, proteomes are extracted from a series of biological samples and digested into tryptic peptides. The peptide mixtures are then individually analyzed using multidimensional chromatography coupled directly on-line with a mass spectrometer operating in a data-dependent tandem MS mode (Fig. 2). The relative abundance of each protein across a set of samples is determined by the number of peptides identified for that specific protein. This quantitative method is based on the observation that the number of unique peptides identified for a protein is related to its abundance in the mixture. For example, albumin, which is present at 60–80 mg/mL, is consistently detected by large numbers of peptides (i.e. >20) in the MS analysis of serum, whereas lower abundance proteins such as cytokines are generally identified by one or two peptides [2]. This result is directly related to the concentration of albumin (i.e. 60 mg/mL) compared with cytokine proteins (i.e. in the
Drug Discovery Today: Technologies | Techniques for rational design
ng/mL range). The subtractive approach is an attractive approach to screening for changes in protein abundances across many samples because of its inherent simplicity and the fact that an unlimited number of samples can be intercompared, whereas stable isotope labeling methods in practice have been limited to two-way (e.g. ICAT) or four-way (e.g. iTRAQ) comparisons. Like most techniques, however, it also has its disadvantages. It is relatively low throughput. Each sample would take a minimum of one day to acquire the necessary data even if the whole process was automated. The quantitative comparison method is imprecise compared with stable isotope labeling methods and, therefore, changes less than threefold cannot be accurately determined with a high confidence level. Low abundance proteins, although detectable, may not provide enough unique peptide identifications to be quantitated using this method.
Challenges in drug target discovery Although MS-based methods are routinely able to detect hundreds of differences between biological samples, this ability is somewhat of a blessing and a curse. The blessing is in the ability to detect so many differences and the curse is in trying to determine which differences are most important and likely to survive downstream pre-clinical validation. Obviously many differences, such as inflammatory or acute-phase response proteins, can be ruled out as potential drug targets, but how to determine the best candidates is still a difficult chore. One method that is now routinely done is to compare changes in the proteome with those observed in an mRNA array. Unfortunately, numerous studies have now shown that the correlation between the amount of a protein and its transcript’s abundance is poor. For example, in a study conducted in our laboratory comparing changes in the abundances of proteins and their transcripts during osteoblast differentiation, we found that the correlation was an abysmal 0.09 [12]. There are many potential reasons for this lack of correlation ranging from post-transcriptional processing events to temporal differences in mRNA and protein expression. The data were then re-compared by binning proteins and their transcripts into functional pathways and the correlation between these groups was then compared. As shown in Table 1, a series of different functional pathways including cell cycle regulation and apoptosis induction showed significant correlation. This comparison allows the location of potential drug targets to be localized within specific functional pathways that can be examined using hypothesis-driven studies directed towards the individual proteins. Let us assume that global screening has brought to fruition potential drug targets. It is at this point that many of the other technologies highlighted in Fig. 1, such as structural proteomics and binding measurements, become relevant. Obviously the standard approach of conducting highwww.drugdiscoverytoday.com
437
Drug Discovery Today: Technologies | Techniques for rational design
Vol. 3, No. 4 2006
Table 1. Pearson correlation values comparing overall functional pathways of proteins and their transcripts during osteoblast differentiation BioCarta pathway
Pearson correlation (Prot. V mRNA)
P-value
GO pathway
0.501
0.047
Cell Cycle
0.829
0.048
Integrin-mediated signaling
0.763
0.046
G-protein coupled-receptor
0.963
0.046
Induction of apoptosis
0.963
0.050
Mitosis
0.825
0.050
Rho protein signal transduction
0.831
0.049
Although poor correlation was observed at the individual protein/transcript level, good correlation was observed when the overall abundance changes seen within functional pathways were compared [12].
Figure 4. Proteomic technologies in the discovery of a biomarker and possible drug target for interstitial cystitis (IC). A series of chromatography steps were performed in which desired fractions were graduated based on their activity in a cell-based assay. The antiproliferative factor (APF) was identified by tandem mass spectrometry (MS) of a simplified fraction that still retained the desired activity. A biotinylated version of APF was synthesized and coupled to an avidin column to serve as a bait to isolate its receptor. The receptor was identified, and validated, by MS and Western blotting as CKAP4.
438
www.drugdiscoverytoday.com
Vol. 3, No. 4 2006
Drug Discovery Today: Technologies | Techniques for rational design
throughput screening of combinatorial libraries of compounds against the proposed target will play a critical role, but it is advantageous to have a purified version of the drug target available to determine its binding characteristics. The determination of protein structures has seen a tremendous increase in throughput in the last few years as automated methods of testing for the optimal expression conditions of recombinant proteins in different cell types have been developed [13]. Automation has also positively impacted the ability to purify expressed proteins, and more powerful X-ray beams and higher field nuclear magnetic resonance spectrometers along with the development of better software and faster hardware have increased the rate at which protein structures can be solved [14,15]. Knowledge of a drug target’s structure can be used to determine if it possesses homology to any other class of protein. This homology mapping can aid in either the selection or the design of an appropriate drug to inhibit the protein’s activity.
suggest that CKAP4 may be a possible druggable target to treat patients suffering the adverse effects of IC. Although this project demonstrates the use of proteomics technology for finding a possible drug target, careful analysis shows that many more technologies beyond MS were critical in the discovery. For instance, a significant amount of chromatography was used to simplify the final mixture enabling APF to be recognized, and a cell-based assay was critical for screening for the desired activity. In the identification of CKAP4, sample preparation in the form of subcellular fractionation to prepare a membrane preparation was instrumental in the identification of CKAP4 as a receptor for APF and a potential druggable target. Finally, functional studies to block CKAP4 activity in the presence of APF are critical to proving a link between APF and CKAP4. Although MS will continue to play a key role, the inclusion of other technological assays will bolster the chances of finding clinically valid protein drug targets in the future.
Application of proteomics to discovery of antiinterstitial cystitis drug target
Conclusion
Although the number of drug targets identified in the academic proteomics world is lacking, there have been successes. In our own laboratory, we have been working over the past few years on interstitial cystitis (IC), a chronic and painful bladder disorder that is characterized by thinning of the bladder epithelial lining. Our initial interest in IC was the discovery of a diagnostic biomarker as it had been shown that urine from these patients contained a factor, named antiproliferative factor (APF), that inhibited bladder epithelial cell growth in vitro. By using a series of separation methods and testing each fraction for growth inhibition, we were able to isolate an active molecule that was identified using tandem MS as a sialoglycopeptide made up of a three moiety sugar group bound to a nine residue hydrophobic peptide, as shown in Fig. 4 [16]. On the basis of the structure of APF, we hypothesized that it exerted its effects on the bladder epithelial lining through binding to a membrane receptor. To find this receptor, a biotinylated form of APF was synthesized and coupled to an avidin column. A membrane preparation prepared from explanted bladder epithelial cells from IC patients was solubilized and passed over the column. The column was equilibrated and bound material was eluted from the column using solutions containing increasing salt concentrations. Each of these fractions was then analyzed by SDS-PAGE. Two faint bands were detected on a silver stained gel of the eluant collected at the highest salt concentration. These two bands were identified as CKAP4, a single pass membrane receptor, and vimentin [17]. Reducing CKAP4 expression in bladder epithelial cells by siRNA diminished the growth inhibitory effects of APF on these cells. Incubation of epithelial cells with an anti-CKAP4 antibody also prevents the growth inhibition effects of APF. These results
The scientific community is able to survey proteins like never before. The two most pressing needs for this type of technology is to find more effective biomarkers for disease detection and discover proteins to which therapeutic drugs can be targeted. One sentiment that is often expressed in the MS community is that if we had more sensitive instruments, we could do better at identifying biomarkers or drug targets. Frankly, I disagree with this thinking. We have the capability of not only identifying orders of magnitude more proteins than just ten years ago, but can also do it in a fraction of time. Unfortunately, this capability has resulted in too many studies that rely too heavily on MS for the discovery of drug targets. One hurdle that must be overcome is to find ways to complement high-throughput MS data with other types of studies that cull the number of possible targets found in a global screening into those targets that are most likely to pass future clinical trials.
Acknowledgements This project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under Contract NO1-CO-12400. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does it mention trade names, commercial products, or organization implied endorsement by the United States Government.
References 1 Hofstadler, S.A. and Sannes-Lowery, K.A. (2006) Application of ESI-MS in drug discovery: interrogation of noncovalent complexes. Nat. Rev. Drug Discov. 5, 585–595 2 Conrads, T.P. et al. (2006) Sampling and analytical strategies for biomarker discovery using mass spectrometry. Biotechniques 40, 799–805 www.drugdiscoverytoday.com
439
Drug Discovery Today: Technologies | Techniques for rational design
3 Pietrogrande, M.C. et al. (2006) Decoding 2D-PAGE complex maps: relevance to proteomics. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 833, 51–62 4 Domon, B. and Aebersold, R.A. (2006) Mass spectrometry and protein analysis. Science 312, 212–217 5 Liu, H. et al. (2002) Multidimensional separations for protein/peptide analysis in the post-genomic era. Biotechniques 32, 898–902 6 Cohen, P. (2003) Protein kinases – the major drug targets of the 21st century? Nat. Rev. Drug Discov. 1, 309–315 7 Glenn, J.S. (2006) Prenylation of HDAg and antiviral drug development. Curr. Top. Microbiol. Immunol. 307, 133–149 8 Abbosh, P.H. et al. (2006) Dominant-negative histone H3 lysine 27 mutant derepresses silenced tumor suppressor genes and reverses the drug resistant phenotype in cancer cells. Cancer Res. 66, 5582–5591 9 Farzan, M. et al. (1999) Tyrosine sulfation of the amino terminus of CCR5 facilitates HIV-1 entry. Cell 96, 667–676 10 Aggarwal, K. et al. (2006) Shotgun proteomics using the iTRAQ isobaric tags. Brief. Funct. Genomic. Proteomic. 5, 112–120
440
www.drugdiscoverytoday.com
Vol. 3, No. 4 2006
11 Oh, P. et al. (2004) Subtractive proteomic mapping of the endothelial surface in lung and solid tumour for tissue-specific therapy. Nature 429, 629–635 12 Conrads, K.A. et al. (2005) A combined proteome and microarray investigation or inorganic phosphate-induced pre-osteoblast cells. Mol. Cell. Proteomics 4, 1284–1296 13 Vinarov, D.A. and Markley, J.L. (2005) High-throughput automated platform for nuclear magnetic resonance-based structure proteomics. Expert Rev. Proteomics 2, 49–55 14 Scapin, G. (2006) Structural biology and drug discovery. Curr. Pharm. Des. 12, 2087–2097 15 Tugarinov, V. et al. (2004) Nuclear magnetic resonance spectroscopy of high-molecular-weight proteins. Annu. Rev. Biochem. 73, 107–146 16 Keay, S. et al. (2004) An antiproliferative factor from interstitial cystitis patients is a frizzled 8 protein-related sialoglycopeptide. Proc. Natl. Acad. Sci. U S A 101, 11803–11808 17 Conrads, T.P. et al. CKAP4/p63 is a receptor for the frizzled-8 proteinrelated antiproliferative factor from interstitial cystitis patients. J. Biol. Chem. (in press)