418
reviews 25 Mathlolultz, E. et al. (1997) h’ature 386, 41@414 26 Zum, A. D., Tseng,J. and Aebischer, P. (1996) Eur. ~Veuml. 36,405408 27 Langer, R. (1996) iVat. .&f. 2, 742-743 28 Snyder, E. Y. and Fisher, L. J, (1996) Cur. Opin. Pediatr. 8, 558568 29 Aebischer, P.. Goddard, M., Signore, A. and Timpson, R. (19Y4) Exp. h’eurol. 126, 151-158 30 Llndner, M. D. et al. (1995) Exp. Meurol. 132, 62-76 31 Llberim, P., Ploro, E. P., MaysInger, D., Erwin, F. R. and Cuello, A. C. (1993) IVeuroxience 53, 625437 32 Lewin, G. R. and Barde. Y-A. (1996) A wu~. Rev. Neu~osci. 19,2X9-31 7 33 SegaI,R. A. andGreenberg, M. E. (1996)Annu. Rev. Neurosci. 19,463-4X1 34 Ip, N. Y. andYancopouIor, G. D. (1996) A nnu. Rev. Neurosti. lY,491-515 35 Johnson 0. L. et al. (1996) h’at. Med. 2, 795-799 36 Barinaga, M. (1994) S&m 264, 772-774 37 Maysinger, D., Krieglstein, K., Fihpovlc-Grcic, J., Sendtner, M.,
Unsicker,
K. and Richardson,
P. (1996) Exp. Xeuml. 138, 177-188 S. (1996) Exp.
38 MaysInger, D., Berezovskaya, 0. and Fedoroff, Xwol. 141, 47-56 39 Ishli, D. N. (1995) Brairl Res. Rev. 20, 4747
40 Le Roith, I). (1997) AVeujEngl.J Med. 336, 633-640 41 Wang, T. ct al. (1997) Nat. Biulerlmol. 15, 3!i8-362 42 LeSauteur, L., Wei, L., Gibbs, B. F. and Sal-agovi, U. (1995)J. Chem.
270,
43 Morris, 44 Sanchez,
Langer. 45 Snyder, 46 Snyder,
Blol.
6564-6569
P. J. (1996) Trozds Btotedzhnol. 14, 163-167 A., Gupta, R. K., Alonson, M. J., Slber, G. R. and R. (1996) 1, I%hann. Sci. 85, 547-553 E. Y. rtal. (1997) Adv. Nc~rml. 72, 121-132 R. Y. and Ma&s, J. D. (1996) Clrn. &.trosli. 3, 310-316 U., N&&in, L., Vetma, I. M., Trono, D. and Gage, F. H.
47 Blamer, (1996) HLtm. Mol.
Genet. 5, 1397-1404
Emerging tandem-massspectrometry techniques for the rapid identification of proteins Ashok R. Dongr6, Jimmy K. Eng and John R. Yates III State-of-the-art techniques such as liquid-chromatography-electrospray-ionisation tandem mass spectrometry have, in conjunction with database-searching computer algorithms,
revolutionised
the analysis of biochemical
species from complex
biological mixtures. With these techniques, it is now possible to perform h,ighthroughput protein identification at picomolar to subpicomolar levels from protein mixtures. This article provides an overview of the techniques and methodologies available for the structural elucidation and identification of proteins and peptides from complex biological samples.
The past ten years have seen an exponential growth in the field of biological mass spectrometry. The inception of such ‘soft’ ionisation techniques as fast-atom bombardmentlJ, electrospray3 and matrix-assisted laser desorption ionisation (~JIALDI)~*~ in the mid1980s made it feasible to ionise and introduce into the gas phase thermally labile compounds such as proteins, peptides, carbohydrates and oligonucleotides. These advances in ionisation methods, coupled with various mass analysers, have generated new, technologically sophisticated, mass spectrometry instrumentation that is currently being used to characterise the primary structures of peptides and proteins involved in complex A. R. Don@ (
[email protected]}, J. K. Eng (eqj@ u.washington.edu) and J. R. Yates III
[email protected]~~tor2.edu) are af the Deparrmetit of A4okrulav Biotechnology, University of Washington, Seattle, WA 98195, USA. A. R. Dar@ is also al the Department ofImmunology and the Howard Hughes Medical Institute al the University of Washiqgton, Seattle, WA 98195, USA. TIBTECH OCTOBER 1997 NOL 15)
Copynght
Q 15197, tlsevler
Sctence
biological processes. These processes, such as antigen processing and presentation, covalent modifications associated with signal transduction and single point mutations, can now be readily studied using tandem mass spectrometry7-“. The worltdwide large-scale DNA sequencing efforts that are currently being pursued will further aid in the advancement of mass spectrometry to study the complex biological processeslO. The complete genomic analyses of organisms such as Saccharomyces cevevisiae and Haemophilus influenzae, Escherichia coli, and the impending complete sequencing of the human genome, will create rich genomesequence databases 11 that will enhance our ability to study both physiological and biochemical processes. The primary goal of a biological study is to identify the functional species involved in a process, and involves placing a biological context on genes and gene products. Tandem mass spectrometry is ideally suited to provide a powerful link between genomics Ltd. All rights reserved.
0167
- 7799/97/$17.00.
HI: SO167-7799(97)0111@l
419
reviews and functional genomics (proteome) strategies for the study of biological
and create new processes.
a
MS1
MS2
Overview of tandem mass spectrometry and its application to peptide- and protein-sequence analysis Tandem mass spectrometry (MS/MS) has become a very powerful tool in both the analytical and the biochemical sciences. A tandem mass spectrometer consists of two mass analysers separated by an ionactivation device (Fig. la). This experimental setup makes it possible to separate an ion of interest using the first mass analyser, induce the ion to dissociate by activation and then analyse the dissociation products in the second mass analyser. These dissociation products provide a ‘structural fingerprint’ for that particular ion and thereby provide detailed structural information (e.g. amino acid sequence) on it. Recent years have seen an exponential growth in the application of tandem mass spectrometry for the sequence analysis of peptides and proteins12-22. The sequencing of peptides and proteins by various ionactivation techniques has been demonstrated, including low- and high-energy collision-induced dissociation (CID)12-21, surface-induced dissociation (SID)22,23 and photoinduced dissociation (PID)2”. Currently, the most commonly used ion-activation technique for the dissociation ofpeptide ions is low-energy CID (which involves the collision ofpeptide ions at low speeds with an inert gas such as argon)‘“. The fragment ions produced upon peptide-ion activation are then analysed by a second mass analyser. These tandem mass spectrometry experiments can also be performed using a single mass analyser, such as a quadrupole ion trap or a Fourier-transform ion-cyclotron-resonance mass spectrometer by separating the mass-analysis and/or ion-isolation events in time”jJ6. Under these lowenergy gas-phase collision conditions, peptide ions mostly fragment at the amide (peptide) bonds along the backbone, generating a ladder of sequence ions. If the charge is retained on the N-terminal portion of the fragment ion after cleavage of the amide bond, then b-type ions are formed, and if it is retained on the C-terminal portion, then y-type ions are formed (Fig. lb)“. A complete series ofeither one or both ion types allows the amino acid sequence to be determined by subtracting the masses of adjacent sequence ions’2,1sJ3J8-30. Mechanistic studies have shown that the position of basic amino acid residues such as arginine and lysine on the peptide backbone dictates which type of sequence ion will form; if arginine or lysine is present at the N-terminus of the protein, then the tandem mass spectrum is dominated by b-type ions, and if they are at the C-terminus, then the spectrum is dominated by y-type ions. However, ifthese basic amino acid residues are somewhere in the middle of the peptide backbone, the complexity of the spectrum is increased. In general, the predictability of peptide-h-agmentation patterns in a tandem mass spectrometer under both lowand high-energy collision conditions has led to the development of sequencing methods using these data12-17.
lonisation
Separation
Activation
b
M,ass
analysis
b, ?
N-terminus
a,0 II
’;-“; 0 I
7
II
i
NH,--CH-C~NH-CH.C-NH~CH-C~NH k
6
A,
, Lx2
0
F i i CH.C/NH.fzH/C,OH 1 i.
6 z2
II
A,
C-terminus
-i* Figure 1 (a) Schematic
diagram of a typical tandem-mass-spectrometry seiup. The two mass analysers, MS1 and MS2, are separated by an ion-activation region, in which the ions of interest (isolated by MS11 are dissociated for analysis in MS2. (b) Peptide backbone depicting the typical backbonecleavage ions. Low-energy ion activation, which is the most widely used ion-activation method, yields mainly ‘b,‘, ‘yn’ and ‘a,’ type ions. Ion types ‘c,‘, ’ x,’ and ‘z,’ are rarely observed under low-energy collision conditions. The subscript numbers associated with these fragment-ion types indicate the position of the backbone-cleavage site from the N- or C-terminus, depending on the ion type being considered. This nomenclature is a modified forml3of the Roepstroff and Fohlman nomenclaturel7.
Electrospray ionisation: basics and recent developments The introduction of electrospray ionisation in the late 1980s by Fenn and co-workers created a powerful new method for the ionisation and introduction of biological molecules into gas phase31, and several reviews on this technique and its uses in peptide and protein analysis have appeared since its inception3”-37. Electrospray ionisation is a process in which a solution containing the analyte is sprayed across a high potential difference. The charged droplets from the spray are desolvated by either countercurrent gas flow (that is, in the opposite direction to the spray) or by passag;e through a heated capillary; the ions obtained after desolvation of these charged droplets are then analysed in a mass spectrometer. Over the past ten years, th,e practice of electrospray ionisation has been refined into a versatile ionisation technique for a wide variety of analytes differing in their chemical makeup, s&E, complexity and biomolecular stability.
Micro- to nanoelectroqray
ionisation The principal challenge in protein biochemistry is the quantity of protein available for study. Manipulating and analysing minute quantities of protein amidst a variety of complex chemical matrices is a daunting challenge that has led to the modification and improvement of conventional electrospray ionisation. Gale et al. have reported the development of a lowflow-rate electrospray (microelectrospray) source that TIBTECH OCTOBER 1997 WOL 15)
420
reviews could efficiently spray small volumes of aqueous solutions, thereby avoiding the denaturing of biomolecules caused by the organic solvents required for the effective performance of conventional electrospray3s. This source could achieve pressure-assisted flow rates as low as 200 nl min-1 with etched fused silica capillaries with inner diameters as small as 20 pm. Improvements in the electrospray signal stability were reported, together with a four- to tenfold improvement in sample sensitivity and a two- to fivefold decrease in sample consumption. Another design, by Caprioli and coworkers, achieved similar improvements in sensitivity, signal stability and signal-to-noise ratio3”, and detection limits of 1 femtomole total sample consumed (lo-15 mole) for small peptides like methionine-
a
enkephalin. At the same time, ‘Wilm and Mann published a detailed theoretical treatise on electrostatic dispersion during electrospray ionisation process+0*41. They reported that reducing the size of the emission zone at the tip of the Taylor cone would reduce the droplet size to 200 nm or below, which would allow the flow rate to be reduced to 25 nl min-l (an order of magnitude lower than other microelectrospray sources). This source design, commonly known as a nanoelectrospray source, can easily be adapted to many types of mass spectrometers. These source designs can also be beneficial when small quantities of protein and peptide mixtures (typically l-2 ~1 of a solution) can be used for extended periods of time (several tens of minutes) to perform a variety of
HPLC
Packed microcapilla
Detection
Electrospray source
MS1 quadrupole
Collision cell
Triple quadrupole
mass spectrometer
MS2 quadrupole
Ion Trap
b
ucrapotes
Packed microcapillary cnlumn
Microspray
source Ion-trap mass spectrometer -
Detection system
Capillary column
-
Pressure vessel -
+-----
Nitrogen I Helium
Vial of sample solution Figure
2
Schematic of microcolumn high-performance liquid chromatography (HPLC) with a precolumn microsplitter interfaced to a tandem quadrupole mass spectrometer with a electrospray ionisation source. (b) A similar microcolumn-HPLC setup interfaced to an ion-trap mass spectrom’eter with a microspray ionisation source. (c) High-pressure apparatus for efficient sample loading and packing of microcapillary liquid-chromatography columns.
(a)
TlBTECHOCTOBER1997b'OL15i
421
reviews mass-spectrometry experiments. In addition, this extended analysis time helps to achieve an acceptable signal-to-noise ratio by performing extensive signal averaging on peptides present at low-femtomole levels.
Combining liquid chromatogruphywith muss usingelectrospray ionisation
spectrometry
Mass spectrometry is inherently well-suited for mixture analysis, and this suitability can be further augmented by electrospray ionisation, which enables liquid-chromatographic techniques to be combined with tandem mass spectrometers, making it possible to assign structural identity to individual species in a complex mixture by chromatographic separation based on their chemical properties, followed by a separation based on mass:charge (m:z) ratios and subsequent structural characterisation in the MS/MS mode. For most studies on peptides and proteins, the low flow rates used by small-bore liquid-chromatography (LC) columns offer the greatest sensitivity for analysis”‘; m order to achieve these low flow rates, microcolumn LC offers a favorable balance between flow rate, performance, reduced sample loss and ease of column construction. Protocols for column construction are described in detail by Shelley et ~1.“~ and Kennedy et al,44 Interfacing the microcolumn LC to electrospray ionisation is straightforward. Figure 2 illustrates a microcolumn-LC setup on a tandem mass spectrometer; standard off-the-shelf high-performance liquid-chromatography (HPLC) pumps can be used, but, in order to achieve low flow rates of 0.5-l .O p,l mini, a precolumn splitter is used. An elegant HPLC has been built by Davis et al. that uses preformed gradients to circumvent the limitations of flow splitting45. The system produces reproducible gradients and allows the use of stop-flow techniques to increase the analysis time of a particular chromatographic peak. Furthermore, when working with capillary columns and very small amounts of sample, conventional sample-loading injection loops can lead to sample losses; in order to minim&e these losses, samples can be pneumatically injected directly onto the microcapillary column (Fig. 2~). In order to obtain primary structural information for peptides, it is necessary to acquire tandem-mass-spectral data on the chromatographic peaks that are eluted from the microcolumn LC onto the tandem mass spectrometer. Two approaches are generally employed to acquire tandem-mass-spectral data for peptides. The first approach requires premeasurement of m:z values for all the components of the complex mixture, followed by a separate analysis of the same mixture to obtain tandem mass spectra. The shortcomings of this approach include high sample consumption and a requirement for considerable manual intervention, leading to limitations on tandem-mass-spectral data acquisition. The second approach involves automated data acquisition by computer control of the instrument, a process that allows both MS and MS/MS data to be acquired in a single analysis. Some tandem mass spectrometers can be operated through instrument control languages that allow pre-
cise and rapid control of all instrument parameter?+‘x. This approach improves the efficiency, reliability and accuracy of data acquisition, which is essential for rapid, high-throughput analysis of complex mixtures ofpeptides and proteins. A typical computer algorithm for the automated acquisition of tandem mass spectra would involve the following steps: (1) scan a mass analyser to record m:z values of io-ns entering a mass analyser; (2) search this mass spectrum for an m:z value based on certain preset guidelines; (3) select the searched m:z value and subject it ‘to MS/MS conditions that are calculated from the m:z value of the selected precursor ion; (4) acquire tandem mass spectra; (5) reset the instrument back to step 1. A tandem mass spectrum will be obtained every time the guidelines in step 2 are fulfilled. A computer algorithm based on these steps, when used with a tan dem quadrupole mass spectrometer, can theoretically generate a tandem mass spectrum every five seconds, and so a single 60 min microcolumn-LC run wi,th automated MS/MS can yield as many as 400 tandem mass spectra from a complex mixture of peptides. This abundance and complexity of the tandem mass spectra of peptides make data analysis and structural characterisation of peptides by conventional means (i.e. manual structural elucidation and sequencing) almost impossible. In order to circumvent this problem, we have developed a fully automated computer algorithm to search protein and nucleotide databases using a powerful correlation-analysis method to compare sequences from the database to tandem mass spectra49. Automated microcolumn-LC-MS/MS followed by computer-assisted data analysis provides a unique tool to perform rapid, high-throughput analysis for accomplishing protein identification and sequence analysis from complex mixtures-s”. Data analysis using computerised search algorithms There are a number of computer algorithms available to perform peptide-mass mapping by searching protein databases sl. Most of these computerised search algorithms are based on using single-stage mass-spectral data. The m:z values of peptides (which in turn yield the M, of the peptides) analysed from a proteolytic protein digest are used to search protein databases in order to obtain a list of peptides with a matching M, and their respective protein sources. The potential shortcomings of this approach include difficulties in obtai-ning enough peptides for unambiguous identification, due to inefficient proteolysis or small quantities of sample, and unexpected m:z shifts resulting from protein modifications such as phosphorylation and glycosylation. However, there are some elegant scoring algorithms, such as MOWSE, that perform peptide-mass mapping with considerable degree of accuracyi2. Recent advances in the mass accuracy of TOF (timeof-flight) mass spectrometers have also contributed to improvements in database searchings3. TIBTECH OCTOBER 1997
IVOL 15)
422
reviews
Nucleotide MS1
Collision
Experimental
cell
mass
sequence
MS2
tags
spectrum
Cross-correlation Examples Source 1. 2. 3. 4. 5.
database
Enolase Pyruvate kinase Hexokinase Hypusine BMHl
+ of sequences
analysis identified Motif
EEALDLIVDAIK NPTVEVELTTEK IEDDPFVFLEDTDDIFQK APEGELGDSLQTAFDEGK QAFDDAIAELDTLSEESYK
Figure 3 A depiction of the database search algorithm SEQUEST.
The experimental tandem mass spectrum is compared with predicted theoretical tandem mass spectra reconstructed from sequences retrieved from protein or nucleotide databases. These sequences are scored against the experimental mass spectrum to obtain preliminary scores. The top 500 theoretical mass spectra from the preliminary scoring are then subjected to cross-correlation analysis; the peptide with highest score is deemed to be the sequence for the experimental spectrum.
Compterised search algorithmsfor tandem-massspectral data analysis Mass-mapping algorithms are dependent on a set of m:z values produced by protein proteolysis, and so, for a single peptide or a small set of peptides derived from proteolytic digestion, mass mapping can be ineffective. The use of tandem mass spectra for database searching is ideal for protein identification with a limited set of peptides or to identify a peptide’s amino acid sequence. In 1994, our laboratory reported the development of database-searching algorithms using tandem-massspectrometry data4”. Later the same year, Mann and Wilm reported an algorithm for searching databases based on partial sequence information that can be obtained from the tandem mass spectrum of a peptide5”. The manual interpretation of the tandem mass spectrum produces a short sequence string between three and six amino acids in length, called a ‘sequence tag’. This short string sequence, together with the peptide molecular weight, product-ion m:.z values and protease cleavage specificity, are used by the program to generate a small number of potentially matching amino acid sequences. If the search does not identify a sequence, additional searches can be performed by TIBTECH OCTOBER 1997
WOL 15)
changing the search parameters to exclude ‘zones’ such as the N-terminal or C-terminal region, signified by the fragment m: r values. When a fragment ion is removed from consideration, the specificity is decreased and matches to many sequences can occur; this situation requires that all sequence matches be manually evaluated. The use of these sequence tags for database searching has been dernonstrated with MS/MS data of intact protein+ a& also with postsource decay (PSD) MALDI MS/MS data of peptides, but with limited successs6. Other searching methods using tandem mass spectral data were later introduced that are based on using the m:.z value of the precursor ion, two or more fragment ions and the proteolytic cleavage specificity (http://chait-sg:l.rockefeller.edu/ or http://rafael.ucsf.edu/mstag.html); however, unlike Mann’s sequence-tag approach, these programs do not require prior manual interpretation of the MS/MS data to identify amino acid sequence. The second approach developed by our laboratory uses the ‘uninterpreted’ tandem-mass-spectral data to search protein and nucleotide sequence databases to obtain accurate sequence inform,ation about the peptide and/or protein being investigated. This first published report of a computer algorithm for using tandem mass spectra to search sequence databases exploited the predictability of peptide fragmentation in tandem mass spectrometry and is called SEQUESTJ9. The underlying principle of SEQUEST is cross-correlation analysis between the experimental tandem mass spectrum and theoretically generated tandem mass spectra from peptide sequences with a similar molecular weight to the experimental precursor ion (Fig. 3). The peptide sequence with the highest score is deemed to be the one that produced the experimental tandem mass spectrum. Amino acid sequences in a protein database are scanned to find linear combinations of amino acids that are within a certain mass tolerance of-the peptide mass. This mass tolerance can be varied, depending on the requirements of the user, from 0.5 to 200 Da; larger mass-tolerance values lead to slower searches without affecting the accuracy of the search, as two scoring methods are used to evaluate the degree of fit between the peptide sequence and the tandem mass spectrum. The algorithm also has the ability to perform a search based only on the molecular weight of the peptide, or it can incorporate the cleavage specificity for a variety of different proteases that could be used to generate the peptide. Several chemical modifications, including phosphorylation, acetylation and nonstandard amino acid residues such as hydroxyproline and hydroxylysine, can also be considered and incorporated into the searchs2. Nucleotide databases can be searched by translating the nucleotide sequence ‘on the fly’ (i.e. during the search) to protein sequences in all six reading frames5’. When an amino acid sequence is within the set mass tolerance, the first (or preliminary) scoring is performed. This considers the predicted fragment ions and matches them to the ions in the experimental tandem mass spectrum. The type of fragment ion
423
reviews generated is dependent on the type of the tandem mass spectrometer and ion-activation method used, and this program can be tailored to take these differences into account when performing the search. A preliminary score is calculated based on the number of matched fragment ions, the size of the continuous set of sequence ions, the types of immonium ions present in the spectrum and the total number of predicted sequence ionsly. The correct amino acid sequence is frequently identified purely on the basis of the preliminary score; however, a powerful closeness-of-fit method is used to confirm the highest-scoring amino acid sequence and increase the sensitivity of the search for poor quality MS/MS data. This scoring approach compares the experimental tandem mass spectrum with theoretical tandem mass spectra generated (for the top 500 sequences based on the preliminary score) by calculating predicted sequence ions and then assigning a relative abundance to each ion type; a crosscorrelation function is then used to compare the reconstructed spectrum with the experimental spectrum (Fig. 3), and the normalised cross-correlation value determines the ‘correct’ answer. This two-step scoring method allows full automation of the search process, because no information needs to be derived manually from the tandem mass spectrum. Therefore, in conjunction with microcolumn-LC-MS/MS, SEQUEST provides a powerful approach for performing rapid, high-throughput analysis and protein identification from complex biological mixtures.
Gel-based analysis of proteins by tandem mass spectrometry 1-D and/or 2-D gel electrophoresis are the most popular methods for protein separation and purification, and automated Edman degradation the method of choice for sequencing proteins obtained from gels; the sensitivity of mass-spectrometry techniques has, however, surpassed that of the conventional sequencing techniques. In the past three years, several groups have demonstrated the effectiveness of gel-based protein separation coupled to MALDI-TOF mass spectrometry for database identification of gel-separated proteins with a high level of accuracy and confidencesx-61. In some examples, the MALDI-TOF mass-spectrometry technique was supplemented by amino acid analysis or PSD of peptide ions to achieve more accurate protein identification. This was mostly due to incomplete digestion of the protein, errors associated with mass assignments due to posttranslational modifications and complications caused by cation and matrix-adduct peaks in addition to the protonated tryptic fragments of the protein. An alternate approach uses the high specificity of tandem mass spectra to identi+ gel-separated proteins. The general approach is to use MS/MS in conjunction with micro- or nanoelectrospray, microcolumnLC-electrospray, or microcapillary-electrophoresiselectrospray-ionisation techniques. In 1996, Mann and co-workers reported the characterisation of proteins excised from Coomassie(50-100 ng protein)
and silver- (5 ng protein) -stained polyacrylamide gels by using nanoelectrospray-MS/MS in conjunction with the ‘PeptideSearch’ program, which uses the sequence-tag approach to database searching62163. They identified and structurally characterised, at the 400 ng level, a 45 kD protein that inhibits the proliferation Iof capillary endothelial cells if1 vitvo and thus may have potential antiangiogenic effects on solid tumors6”. Furthermore, the utility of this technique as an adjunct ~to peptide mass mapping has also been demonstrated by identiEjing various S. cerevisiae proteins separated by 1-D and 2-D gelsb3. The sequence-tag approach, conlbined with MS/MS, has been also used to identify a protein in a pathway associated with apoptosis@. Figeys et al. have also successfully identified yeast proteins from silver-stained 2-D gels using an integrated solidphase microextraction and capillary-zone electrophoresis peptide-separation device that was coupled ‘to a tandem mass spectrometer through a microelectrospray-ionisation source, in conjunction with SEQUESTh5. They have also been able to achieve detection limits as low as 600 attomoles (600 X lo-l8 moles) for a tryptic digest of bovine serum albumin using this device”“. The automated nature of database searching with SEQUEST lends itself to large-scale studies for the rapid identification of proteins separated by 2-D g;el electrophoresis. Recently, Link et al. performed an analysis of the proteins observed in H. i@enzae NTCC 8143. From the 303 spots isolated from a 2-D gel, they were able to identiEj 263 unique proteins’lh. Using an autosampler coupled to a microcolumnLC-MS/MS in conjunction with SEQUEST [searching tandem mass spectra of peptides from tryptic
1 -D or 2-D gels of complex protein mixtures
11
Excise spots from 1 -D or 2-D gels
Figure
4
A flow chart to illustrate the steps involved in protein identification from 1-D or 2-D gels by tandem mass spectrometry. Details of .the in-gel digestion procedure are elaborated in Refs 41 and 42. TIBTECH OCTOBER 1997 NOL 15)
424
reviews digests against translated genomic sequences (Figs. 3 and 4)] they were able to identify proteins at a rate of 22 per day. This analysis also revealed that about 9% of the sequenced spots were a mixture of two or more proteins, indicating that some proteins will comigrate even during a 2-D electrophoretic separation. These studies also help address fundamental biochemical questions, such as the identities of the most abundant proteins, as well as assessing the similarity between experimental protein sequences and those predicted by gene sequences, correcting several frame-shift errors in the nucleotide sequence and verifying the existence of 25 hypothetical proteins. Interestingly, a protein identified in this H. influenzae strain had been deleted from the RD strain that was used to generate the genomic sequence. Shotgun identification of peptides and proteins from complex mixtures Identification of proteins from a heterogeneous complex biological mixture without any prior separation, such as 1-D or 2-D gel-based separation, would allow a rapid analysis of multiprotein complexes and protein-protein interactions, as well as a survey of proteins and protein complexes localised to different subcellular locations, with minimal sample manipulation. We are actively pursuing this approach rapidly to identify and thereby identify peptides using fully automated microcolumn-LC-MS/MS and SEQUEST database searching. In this approach, protein mixtures are proteolytically digested without separation and then directly analysed by microcolumn-LC-MS/MS. The techniques and procedures used to achieve direct rapid identification of proteins from heterogeneous protein mixtures have recently been described”“. This approach is described as ‘shotgun identification’ because of the obvious similarity to the highly successful approach currently employed in genome sequencing. This type of analysis affords good reproducibility, dynamic range and sensitivity, and has been demonstrated using a test protein mixture with widely varying concentrations and molecular weights. Using the direct microcolumn-LC-MS/MS technique and SEQUEST database searching, we have been able to characterise proteins enriched by immunoaffinity interactions, interactions with glutathione-S-transferase fusion proteins and interactions with macromolecular complexes50. Variations of this approach have been successfully applied by McCormack et al. to identify antigenic peptides isolated from murine major histocompatability (MHC) class II molecules”7,67 and by Link et al. to survey the identities of E. coli periplasmic protein@. We are currently using it to pursue the largescale identification of antigenic peptides isolated from murine MHC-class-II molecule IAb. MHC-class-II molecules were isolated by immunoaffinity purification and the bound peptides eluted and identified using microcolumn-LC-MS/MS. Approximately 150 antigenie peptides, constituting approximately 50-60 endogenous proteins, were identified (Dongre, A. R. et al., unpublished). We are also currently identifying TIBTECH OCTOBER 1997 (VOL 15)
protein fragments present in the endosomal and lysosomal compartments of antigen-presenting cells using this approach, in order to understand the mechanisms of formation of MHC-class-II complexes and their trafficking ti-om the endoplasmic reticulum to the cell surface (Dongre, A. R. et nl., unpublished). Future prospects The ability to identify proteins rapidly and accurately by using mass spectrometry has been an unexpected benefit of the genome projects. Mass spectrometry, and, in particular, tandem mass spectrometry, will be an important tool for highthroughput functional studies of the gene products. Protein identification using tandem mass spectra has several important advantages. First, an identification is possible on the basis of a single pepti’de spectrum. Second, each tandem mass spectrum represents an independent piece of information, and so additional spectra that match the same protein add considerable strength to the identification. Third, the ability to identify proteins based on a single tandem mass spectrum allows the identification of proteins present in complex mixtures. Finally, post-translational modifications do not appear to complicate the identification and can be placed within the amino acid sequence at the specific site of modification with the aid of computer programs. In the not-too-distant future, most proteinbiochemical studies will benefit from the ability to identify proteins rapidly by using mass spectrometry. Acknowledgments The authors would like to acknowledge the financial support provided by grants from National Institutes of Health (ROl GM 52095) and National Science Foundation, Science and Technology Center (BIR 9214821). A. R. Dongre is supported by a Howard Hughes Medical Institute Research Associateship. The authors would also like to acknowledge important suggestions made by A. Y. Rudensky. References 1 Barber, M., Bordoh, R. S., Sedgwck, R. D. and Tyler, A. N. (1981) J CJrem. sot. CJ~cm. Commun. 197, 401404 2 Fens&u, C. and Cotter, R. J. (1987)-f. C/m. Rev. 87, 501-512 3 Fenn, J. B., Mann, M., Meng, C. K., Wang, S. F. and Whmhousr. C. M. (1989) S&m 246, 61-71 4 Smith, R. D., Loo, J. A., Edmonds, C. ‘G., Barinaga, C. J. and Udseth, H. R. (1990) Anal. Chem. 62, 882~-8YY 5 Karat, M., Bachmann, D., Bahr, U. and Hlllenkamp, F. (1987) Int]. ‘2!& Spectrorn. ran PtiJccsiei 78, 53-68 6 Hlllenkamp, F., Karas, M.. Beavis, R. C. and Chait, B. T. (1991) Anal. Clwn. 63, 1193A-1203A 7 Hunt, 1). F. et al. (1992) Science 255, 1261-1263 8 Hunt, D. F. etal. (1992) Scierm 256, 1817-1820 9 Bran, M. F., Cm, S. A., Thorne, G. C., Reilly, M. H. and Gaskell, S. J. (1991) AmI. CIwm. 63, 1473-1481 10 Hunkap&x, T., Kaiser, K. J., Koop, B. F. and Hood, L. E. (1991) Science 251,
11 Olson, M. 12 Hunt, D. Hauer, C. 13 Biemann, 14 Bmnann,
59-62
(1993) Proc. AM. Acad, Sri. L’. S. A. 90, 4338-4344 F., Yates, J. R., III, Shabanowitz, J., Winston, S. and R. (1986) Pm. Xztl.Acad. .Sci. Ii. S. A. 83, 6233-6237 K. (1988) Biomed. Ewirort. Mm .Spctmm. 16, YY-1 IO K. (1990) m Methods in dCnzym&fy (Vol. 193)
425
reviews (McCloskey.
J. A., rd.), pp. 351-360, Academic Press K. (1990) in Methods in Enzyrmlogy (Vol. 193) (McCloskey, J. A., ed.), pp. 455-47Y, Academic Press Biemann, K. and Scoble, H. A. (1987) S&we, 237, YY2-998 Roepsno~ P. and FohIman, J. (lY84)J. Bmed. MLU Specmrn. 11,601~03 Medzihradsky, K. F. and Burhngame, A. L. (1994) m Merkods: ‘4 Companion to Methods in Enzymolqy (Vol. f,), pp. 284-303, Acadermc Press McCormack, A. L., Eng, J. and Yates, J. R., III (1994) m Metkods: A Cwqxxiw 10 .Wtkrthods in Eqwrolqy (Vol. 6), pp. 274283, Academic Press Papayannopoulos, I. A. (1995) &lass Spectrom. Rev. 14, 49-73 Hunt, D. F., Shabanowitz, J. and Yates, J. R., III (1987)j. Chern. Sec.
15 Blemann. 16 17
18
19
20 21
Ckern. Clnnmun. 548-550 22 Johnson, R. S., Martm,
S. A., B~emann, K., St&, J. T. and Watson, J. T. (1987) Arlai. Clrern. 59, 2621-2625 23 McCormack, A. L., Somogy1, A., Dongrt, A. R. and Wysocki, V. H. (lYY3) Anal. Chew. 65, 2859-2872 24 Dongre, A. R., Somogyl. A. and Wysocki, V. H. (1996) 1. &fur spcitmm. 31, 339-350 25 Martm, S. A., Hill, J. A., KxtrrU, sot. Ma3 Spectrmn. 1, 107-l YY
-1. .4m. soi. A4as 49 50
C. and B~emann. K. (lYYO)].
Am.
26 Senko, M. W., Beu, S. C. and McLafferty,
F. W. (1994) Anal. Clwn. 66,415-417 27 Cos. K. A., Wilhams. J. D., Cooks, R. G. and Kaiser, R. E., Jr (1992) Eiol. Mass Spe&m. 21, 2X-241 28 Alexander, A. J., Thibault, P., Boyd, R. K.: Curtis, J M. and Rinehart, K. L. (1990) Int.]. i2lau Specfrorn. Iw Pmcesres 98, 107-134 29 Jones. J. L., Dot+, A. R.. Somogyi, A. and Wysocki, V. H. (1994) 1, Am. C/rem. Soc. 116, 8368-8369 30 Don&, A. R.. Jones, J. L., Somogyi, A. and Wysocki, V. H. (19Yh) 1. Am. Cketn. sot. 118, 8365-8374 31 Wang, S. F., Meng, C. K. and Fenn,J. B. (1988)]. P+. &m. 92,54&r-550 32 Fenn, J. B.. Mann, M., Meng. C. K., Wang, S. F. and Whitehoure, C. M. (1990) !!fass Spectrorn. Rev. 9, 37-70 33 Smith, R. D., Loo, J. A., Edmonds, C. G., Bannaga, C. J. and Udseth, H. R. (1990) Anal. CIvm. 62, 882-899 34 Smith, R. D., Loo, J. A., OgorzaIek Loo, R. R.. Busman M. and Udseth, H. R. (1991) &fux Spectroru Rev. 10, 359351 35 Fenn, J. B. (lYY3)]. Am. Sot. iL,fus Spedrw~. 4, 521-535 36 HofstadIer. S. A., Bakhtw R. and Snuth, R. D. (1996)J. Clicm. Ed. 73, A82-A88 37 Bakhtiar, R., Hoistadler, S. A. and Smith, R. D. (lY96)j. Ckcm. Ed. 73, All&Al23 38 Gale, D. C. and Smith, R. D. (1993) Rapid Carnmw~. !/lacs Spectrum. 7,1017-1021 39 Emmett, M. R. and Capnoh, R. D. (1994)). Am. Soi. Mass Spectrwt. 3, 605-613 136,
41 Wilm, M. S. and Mann, M. (lYY6) Anal. Chem. 6X, l-8 42 GnfIin, P. R., Coffman, J. A., Hood, L. E. and Yates, J. R., III (1901) r1tf.J. .wlss Spectrom Ivr1 Pwawr 111, 131-149 43 Gluckman, J. C., Hlrose, A., McGuffin, V. L. and Novotny, R4. (1983) Ckromatc1grapkia 17, 303-309 44 Kennedy, R. T. and Jorgenson, J. W. (19X9) Anal. Clxnt. 61, 1128~1135 45 Daws, M. T., Stahl, D. C. and Lee, T. D. (1995)j. Ant. SOC. ?&ISS Spectrum. 6, 571-577 46 Stahl, D. C., Martmo, P. A., Swiderek, K. M., Davis, M. T. and Lee, T. D. (1992) Prrxe~diq~ qf t/~e 39th &MS Cor~ferem on Moss Spectrometry and ,41&d Topics, Pp. 1801-1802, Ametican Society for Mass Spectromrtry 47 Yates, J. R., III, McCormack, A. L., Lmk, A. J., Schleltz. D., Eng, J. K. and Hays, L. G. (lYY6) ilnalyst 121, 65R-7hR 48 Stahl, D. C., Swderek, K. M., Davis, M. T. and Lee, T. D. (19985)
167-180
51 52 53 54 55
56
7, 532-540
Elcctwpkoresis
17, 877-89
1
57 Yates, J. R., 111,Eng, J. K. and McConnack,
A. L. (1995) Awl.
Clwn.
67, 3202-3210 58
Cordwell, W~lhana,
S. J., Wllkms, M. R., Cerpa-Poljak, A., Duncan, K. L. and Humphery-Smith, I. (1995) EI~ctvopkoresis
M., 16,
138-413
Pattmon, S. D. and Aebersold, R. (1996) Efcmopkovais 16, 17Yl-1814 60 Wheeler, C. H. et al. (1996) Elrctropkor~ris 17, 58&587 61 Gevaea, K. ef al. (1996) EIectrq&resis 17, 918-924 62 Wllm, M. et al. (1996) Xtwe 379, 46&-+69 63 Shevchmko. A., Wilm, M., Vorm, 0. and Mann, M. (1996) Anid. 59
Ckem.
68, 85G858
Muzio, M. et al. (1996) Cell 85, 817-827 65 Flgey$, D., Ducret, A., Yates, J. R., 111and Arbrrsold, 64
Biutrcknol.
R. (1996) !Q:,t.
14. 157Ys1583
A. J., Hays, L. G., Carmack, E. B., and Yates, J. R., III (in press) 67 McConnack, A. L., Eng, J. K., DeRoos, P. C., Rudensky, A. Y. and Yates, J. R., III (1995) m Biochemitnl and Bi~teckrzologiul Applications of Elertro~ppray Ionizatm ,Mzss Spec~orne~ry (Snyder, A. P., ed.). pp. X17-225, American Chemical Society 68 Link, A., Cannack, E. and Yates, J. R., III (1996) I,lt. 1. M‘rs~
66
Link,
EIecrropkorem
Specmm
Do you wish to contribute
Spctrom.
Eng, J., K., McCormack, A. L. and Yates, J. R., III (1994)]. .4m. Su A4a~s Spectrom 5, 976-989 McCormack, A. L. rt ai. (1997) Anal. Chew 6Y, 767-776 CottrelI, J. S. (1994) Pepride Ra. 7, 115-124 Pappm. D. J. C., Hojmp, P. andBlaesby. A. J. (1993) Cw Biol. 3.327-332 Jensen. 0. N., Podtelejnikov, A. and Mann, M. (1996) Rapid Cumfnrrn. M‘1iX specfrunl. 10, 1371-1378 Mann, M. and Wilm, M. S. (1993) dnal. C&I. 66, 439&4399 Mortz, E. rf aI. (1996) Pwr. !Qrl. Acnd. Sci. L! S. ii. 93, 8264-8267 Patterson, S. D., Thomas, D. and Bradshaw, R. A. (19985)
Ion Processes 160,
303-316
an article to TIBTECH?
If so, send a brief (half to one page) outline of the proposed content of your article, stating which section of the journal you wish it be considered for. You may also suggest topics and issues that you would like to see covered by the journal. Please contact: Dr Meran Owen (Editor), Trends in Biotechnology, Elsevier Trends Journals, 68 Hills Road, Cambridge, UK CB2 1LA. (Fax: +44 1223 464430) TIBTECHOCTOBER
1997 P/OL 15)