CHAPT ER
8 Intact Protein Mass Measurements and Top-Down Mass Spectrometry: Application to Integral Membrane Proteins Julian P. Whitelegge
Contents
1. Introduction 2. Intact Protein Mass Measurements 2.1 Sample preparation and separations for integral membrane and other proteins 3. Ionization 3.1 Dissociation of intact proteins 3.2 Data interpretation 3.3 Future considerations References
179 180 183 188 189 190 192 194
1. INTRODUCTION The ease with which peptides can be delivered to mass spectrometers, subjected to automated tandem mass spectrometry, and the data subsequently screened for matches to a protein sequence database, has resulted in an overwhelming predominance of ‘bottom-up’ proteomics strategies. Since there is greater sensitivity of the mass spectrometer to peptides rather than proteins, a larger proportion of the proteome is available to bottom-up proteomics. However, a growing body of mass spectrometrists are embracing intact protein mass measurements, and sophisticated ‘top-down’ tandem mass spectrometry experiments on intact proteins because they realize that proteomic information is lost when the individual proteins of the proteome are cleaved into a complex mixture of small peptides. Furthermore, bottom-up strategies favor peptides that are Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00208-0
r 2009 Elsevier B.V. All rights reserved.
179
180
Julian P. Whitelegge
easily recovered with robust ionization properties (proteotypic peptides), such that proteins or regions of proteins with unfavorable properties will be selected against. Thus, while integral membrane proteins (IMPs) constitute around 30% of the proteome, their transmembrane domains are easily excluded from the average bottom-up proteomics experiment. Progress has been made in this respect, by improving protocols for membrane protein extraction and digestion [1,2]. In this chapter we show how top-down mass spectrometry provides a route toward proteomics experiments that embrace the transmembrane domain by addressing the whole intact protein.
2. INTACT PROTEIN MASS MEASUREMENTS The mass spectrum of an intact protein defines the native covalent state of the gene product and its heterogeneity. To better understand the sort of information that can be lost in ‘shotgun’ proteomics strategies we will consider the PsbH protein from spinach (Spinach oleracea). The measured mass of the protein is 7,598 Da (Figure 1). To calculate the predicted mass of the protein one first visits the protein sequence database at NCBI (http://www.ncbi.nlm.nih.gov/). A search for ‘psbh spinach’ returns five entries because NCBI keeps a ‘redundant’ database where new information is added as a new entry, rather than updating a single entry. Among the five entries, visual inspection reveals one entry from the SwissProt database. Not all proteins have a SwissProt (sp) entry yet but it is nearly always best to use the SwissProt entry, if there is one, because this database is maintained in a non-redundant status. This means the information in this entry has been annotated with the latest information on that protein so it is likely (but not always) the most reliable source for information including the primary protein sequence. Annotation of SwissProt entries is not immediate however and data must be published and then updates manually submitted. The primary sequence and information on known post-translational modifications is found at the bottom of the SwissProt entry and this can be taken for mass calculation. In the case of spinach PsbH, the SwissProt entry (P05146) includes two sequence ‘conflicts’ because a later sequencing effort disagreed with an earlier version. Steve Go´mez actually predicted the sequencing errors based upon the intact mass measurement and a consideration of sequence conservation across a wide range of PsbH sequences [3]. According to the SwissProt entry the initiating Met residue is removed such that the mature form covers amino acids 2–73 (‘mature chain’). The average mass can then be calculated using the link to ‘PeptideMass’, a mass calculator at the EXPASY informatics site (http:// us.expasy.org/tools/peptide-mass.html). Note that mass setting is for ‘M’ rather than ‘M+H+’ and ‘average’ rather than ‘monoisotopic’ mass. Cys residues are unmodified (there are none in PsbH) and the enzyme is set to ‘no cutting’. Of course, there are many other mass calculators but PeptideMass is generally reliable. PeptideMass returns the calculated mass for residues 2–73 as 7,598.8559 Da in reasonable agreement with the measured mass (Figure 1.). Mass accuracy on quadrupole mass spectrometers is around 0.01% (100 ppm) giving a
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
181
Figure 1 The mass spectrum of an intact protein defines the native covalent state of the gene product and its heterogeneity. Intact protein electrospray ionization mass spectrometry was used to profile the PsbH protein from spinach thylakoid membranes prepared from plants incubated in low light versus those exposed to high light for 45 min. The peak for the unmodified protein has a mass of 7,598 Da and phosphorylation adducts (+80 Da) are notable. PsbH is known to have two phosphorylation sites though double phosphorylation can be seen only in the high light sample (7,758). Minor oxidative modifications (+16 Da, circles) can be seen on each phosphorylated form distributed evenly between all three states. Under high light conditions an as yet unidentified +32 Da modification (squares), probably also oxidative, appears preferentially associated with the phosphorylated forms of PsbH, one on the singly phosphorylated species (7,710) and two on the doubly phosphorylated species (7,792; 7,825). Thus the +32 Da modification is related or linked to the +80 Da phosphorylation. Whether one increases the probability of the other will need testing in more developed experiments. Since the two modifications are likely on different sites, a proteolytic cleavage that separates the two modifications (as in a bottom-up proteomics experiment) would result in loss of the ‘linkage’ information. Modified from Go´mez and coworkers [4]. Permission obtained from American Society for Biochemistry and Molecular Biology, 2002.
margin of error of 0.76 Da for the PsbH measurement. The SwissProt entry for spinach PsbH mentions a single phosphorylation site at Thr3. Unfortunately, PeptideMass does not have a convenient way to include mass calculations for modified forms so one must look up the delta mass for phosphorylation and add this to the calculated mass. The Delta Mass tool maintained by ABRF is useful in this respect (http://www.abrf.org/index.cfm/dm.home). If you cannot find the modification online it will be necessary to consider the changes to the atomic formula of the molecule introduced by the modification. Thus for phosphorylation we add +80 Da to the calculated mass (7,678 Da). Figure 1 shows that in the low light sample the singly phosphorylated form is more abundant than the unmodified form, with the assumption that the two species have the same ionization efficiency. For intact proteins this assumption is usually satisfactory but this is not
182
Julian P. Whitelegge
always the case with peptides. For this reason such a conclusion of abundance is usually described as ‘semi-quantitative’. Absolute quantification in mass spectrometry is achieved using internal standards while relative quantification can be achieved using isotopic labeling strategies (see Warscheid, Chapter 17). The minor peaks in the low light spectrum probably correspond to ‘noise’. The different measured masses for PsbH are called intact mass tags (IMTs) and it should be noted that a single protein can give rise to many. Note also that if there were alternative phosphorylation sites on this protein, singly phosphorylated isoforms with the modification at different sites would have the same IMT. The high light spectrum of PsbH (Figure 1) is more complex. A second phosphorylation appears to be apparent. Its presence is supported by a SwissProt entry for Arabidopsis PsbH that reports phosphorylation at Thr5, as well as Thr3 (P56780). So consideration of what’s going on in one species can help interpret what’s going on in another. Such logic underlies comparative physiology and biochemistry, a discipline that draws little attention at this time. The mass calculated for the doubly phosphorylated species is in reasonable agreement with that measured (7,758 Da). ‘Reasonable agreement’ is a ‘wooly’ term that simply means that calculated and measured masses are ‘within measurement error’, 100 ppm in this case. Confidence can be dramatically boosted by working with accurate mass measurements exceeding 5 ppm mass accuracy, though distinguishing protein phosphate from sulfate modifications, for example, requires still higher mass accuracy. Besides phosphorylation, there are other adducts appearing in the high light PsbH mass spectrum. A +16 Da modification appears at low levels on all three phosphoforms, marked with the solid black circle. This is likely due to Met oxidation to its sulfoxide (MetO). More striking however is the appearance of a +32 Da adduct seen only on the phosphorylated forms. The nature of the modification is unknown but an oxidative addition of two oxygen atoms is a reasonable hypothesis. The doubly phosphorylated form appears to have a sub-population with two of these modifications, while the singly phosphorylated species exhibits just one. Thus there appears to be a link between the +32 Da modification and phosphorylation. The biological significance of this link is not yet clear and one can speculate that one modification leads to the other or that one is a response to the other, and so on. The mass spectral output provides information that allows us to develop new hypotheses for future testing. What is clear, is that if the +32 Da modification is happening at a different site to the N-terminal phosphorylations, a bottom-up proteomics experiment where the protein is proteolyzed could lead to separation of the two pieces of information [4]. The histone code is now being considered in this context and Allis has speculated on binary switches involving methylation and phosphorylation of adjacent residues [5]. It is likely that in the coming years the paradigm of single modifications acting as on/off switches will be expanded to include much more complicated logic through multiple modifications. How did we know the protein was PsbH? It was possible to match many thylakoid membrane proteins to their intact masses because post-translational modifications are minimal and predictable in most cases in the chloroplast [4]. PsbH is the only thylakoid protein in that size range. The TagIdent tool at
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
183
EXPASY (http://us.expasy.org/tools/tagident.html) can be used to search for database entries by intact mass alone but the user must be aware that it will not include any modifications relying on a mass calculation from the complete genomic translation. Many other systems will exhibit greater post-translational modification and making assignments based upon intact mass alone will become impossible, and they are at best coincidental. Identification of a protein from its mass can be accomplished in two ways. Ions from the intact protein can be isolated in the mass spectrometer for tandem mass spectrometry (see Section 3.1 on top-down MS), or samples collected concomitantly with elution of the IMT can be subjected to chemical cleavage (CNBr) or digestion (trypsin) to yield peptides for bottom-up tandem mass spectrometry. The only direct way to identify the IMT is through top-down MS because the bottom-up approach could identify several proteins in a collected fraction providing several candidates for the IMT. In the case of bottom-up analysis it becomes necessary that an N- or C-terminal fragment is analyzed by mass spectrometry such that the bottom-up data and the intact mass are consistent with a single processed species. Since the intact mass tag summarizes the primary structure of the protein, proteome-wide studies of IMTs can provide powerful insights. Analysis of the IMTs for nuclear-encoded thylakoid membrane proteins imported from the cytoplasm allowed us to classify three different membrane insertion mechanisms based upon patterns of transit peptide cleavage [6].
2.1 Sample preparation and separations for integral membrane and other proteins The right sample preparation and purification workflow can turn a seemingly impossible task into the routine. IMPs are amenable to both electrospray ionization (ESI) and matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry provided they can be purified away from salts, lipids, and detergents in aqueous/organic solvent mixtures. The best way to achieve this is typically determined empirically but some general trends can be related. IMPs are best left in their native configuration until sample preparation, either in the membrane or as native complexes extracted in mild non-ionic detergents. Disturbance of the tertiary and secondary structure, by running an SDS gel, for example, often renders IMPs highly susceptible to aggregation. Often it is beneficial to precipitate IMPs with organic solvents in order to remove some lipids and detergents but this can render the precipitate difficult to dissolve. High concentrations of organic acids (usually formic acid) are often then necessary for solubilization with immediate HPLC to transfer the proteins to a less reactive environment. In the case of thylakoid membrane proteins it is necessary to precipitate the proteins with acetone, stripping them of bound chlorophyll and other cofactors. The Halobacterium purple membrane, dominated by Bacteriorhodopsin, can be conveniently solubilized directly into formic acid without precipitation. Mammalian membrane systems can be very resistant to dissolution and require detergent disruption followed by organic precipitation to render the proteins suitable for analysis. Clues to successful preparation of a membrane
184
Julian P. Whitelegge
system can sometimes be found in older literature, under unlikely titles demanding time-consuming manual searches in the library. The goal is always consistent, to generate a protein-enriched sample that can be quickly solubilized in formic acid for immediate HPLC. Residual amounts of lipids and detergents should be expected and it is necessary that the chromatography system in use should separate the protein of interest away from such contaminants. A suite of chromatography systems has been described for analysis of membrane proteins [7] and these are illustrated in Figure 2. While we originally described a system that involved a high concentration of formic acid in the aqueous phase throughout the separation [8], systems that rapidly separate the protein from excess formic acid are now favored, due to the potential for covalent formylation of the protein (+28 Da adducts). Liquid chromatography (HPLC) combined with online ESI-MS (LC-MS) has been used for all our method development efforts, allowing us to monitor the covalent status of the eluting proteins and to include minimal modification as a criterion in successful method development. Methods reported in the literature that were developed without online MS as a readout of analyte integrity should not be assumed to be useful without testing this criterion. Thus while elevated temperatures are often helpful in HPLC separations, excessive temperature can accelerate undesirable chemistry. The thylakoid membrane cytochrome b6f complex was analyzed by sizeexclusion chromatography (SEC) and reverse-phase chromatography (RPC) coupled with online ESI mass spectrometry (Figure 2). In both cases the sample was prepared by acetone precipitation prior to dissolution in formic acid (90% in water, v/v) and immediate injection to HPLC. The separation on the SEC system used is limited such that larger proteins elute in the 6–9 min range and smaller ones over 8–11 min (Figure 2A). SEC works very well in its size-exclusion context and the larger protein mass spectrum is free from interference from small proteins or other small molecules in the sample or the formic acid. The four larger subunits (17–35 kDa) were not separated from each other but could be
Figure 2 The right sample preparation and purification workflow makes analysis of a membrane protein complex routine. A sample of spinach cytochrome b6f complex (300 mg protein) was precipitated with acetone (80%, v/v, 201C, 1 h) and dissolved in 90% formic acid for immediate LC-MS analysis. (A) Size-exclusion separation at 250 mL/min in chloroform/ methanol 1% aqueous formic acid (4/4/1, v/v) using a silica support (SW2000 XL, 4.6 mm 30 cm, Tosoh Biosciences, Montgomeryville, PA) at 401C. (B) Reverse-phase separation at 100 mL/min in aqueous/organic trifluoroacetic acid (0.1% TFA) using a polystyrene-divinylbenzene copolymer support (PLRP/S, 300 A˚, 2 mm 15 cm, Varian Inc., Palo Alto, CA) at 401C. The column was equilibrated at 95% A (0.1% TFA in water), 5% B (0.05% TFA, 50% acetonitrile, 50% isopropanol) for 30 min before sample injection. A compound linear gradient was initiated 5 min after injection, ramping to 40% B at 30 min and 100% B at 150 min. Column eluent was directed to the ESI source of a triple quadrupole mass spectrometer (API III+, PE Sciex, Concord, Canada) via a liquid flow splitter that delivered approximately half the eluent flow to a fraction collector (LC-MS+). Data was processed using BioMultiView software. Modified from Whitelegge and coworkers [9]. Permission obtained from American Society for Biochemistry and Molecular Biology, 2002.
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
185
186
Julian P. Whitelegge
Figure 2 (Continued)
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
187
deconvoluted from the composite mass spectrum (Figure 2B). The smaller subunits elute in the tail of the larger subunits but could also be deconvoluted from the mass spectrum [9]. The separation achieved using RPC is much better than SEC and worked well for the cytochrome b6f sample (Figure 2C). Note that smaller subunits (PetL, N, M, G) tend to give stronger ion currents than the larger ones (PetB, PetC) though this is dependent on individual ionization efficiencies achieved under the conditions used and cytf and PetD gave strong signals. An essential feature of this work is the low chemical background achieved when no proteins are eluting such that reasonable signal/noise is achieved even with poorly ionizing proteins. PetB has four transmembrane helix domains and was the most challenging IMP of this analysis – despite relatively low ionization efficiency the signal to noise is excellent (Figure 2D) and the molecular mass profile clearly deconvoluted. Note however that the most intense ion in the spectrum was derived from a singly oxidized (+16 Da) isoform of PetN whose retention was shortened by the modification. The cytochrome b subunit (PetB) was concluded to have a covalently bound heme group (+615 Da) based upon the difference between measured and calculated masses [9] and subsequent crystallography revealed the presence of a Cys-linked c-type heme, as well as the two non-covalently associated b-type hemes known to be associated with the complex [10,11]. The X-ray structure confirmed that the LC-MS analysis included all the subunits of the complex. Recovery of IMPs is not always quantitative. Some dispersed, aggregated protein can be captured on the 0.2 micron filters used to protect columns while more can make its way through the filter but end up in a bound/insoluble state (Figure 3A). Blank injections of formic acid are generally effective at cleaning the filter in the chloroform/methanol/aqueous formic acid solvent used for SEC though abundant proteins such as bacteriorhodopsin can take several injections to clear. Occasional cleaning of the filter with nitric acid is recommended. The bound insoluble material can be shifted to the mobile phase by equilibration of the reversephase column in the chloroform/methanol/aqueous formic acid buffer used for SEC prior to making a formic acid injection. By inserting the size-exclusion column in line after the reverse-phase column, the released protein can be separated from the formic acid for improved mass spectrometry and UV quantification. The addition of isopropanol to HPLC buffers to enhance elution of IMPs was noted by Tarr and Crabb [12]. The utility of the polymeric column at elevated temperature was described by Bowyer and colleagues [13]. The chloroform/ methanol precipitation protocol described by Wessel and Flugge [14] has been useful for much of our work but readers are warned that some IMPs can partition into the chloroform phase, where they can be recovered by SEC as described. While the chromatographic systems described here have performed well for thylakoid membrane proteins as well as bacterial systems, there is undoubtedly a need for development of chromatographic systems for membrane proteins from mammalian systems. Hydrophilic-interaction chromatography (HILIC) has proved useful in mitochondrial systems with aqueous organic extracts being loaded onto polyhydroxyethyl aspartamide columns equilibrated at high organic concentrations and then eluted with a gradient of decreasing organic concentration [15].
188
Julian P. Whitelegge
Figure 3 Ionization is straightforward provided the protein is in solution in a suitable solvent. (A) Reverse-phase separations tend to retain sub-populations of some IMPs in a bound, insoluble sink. This sub-population can be shifted to the soluble phase by equilibrating the column in the buffer used for SEC in Figure 2 (100 mL/min), and making a formic acid injection. (B) By eluting the reverse-phase column as described in A through the size-exclusion column, the released protein can be separated from the formic acid for mass spectrometry. Figure 3A was modified from Whitelegge and coworkers [23] with permission of Future Medicine Ltd., London, 2006. Figure 3B was modified from Whitelegge and coworkers [24]. Permission obtained from Elsevier Ltd., Oxford, 2005.
3. IONIZATION Ionization, both ESI and MALDI, is straightforward provided the protein is in solution in a suitable solvent, typically an aqueous organic mixture lacking nonvolatile salts or detergents. It is important to note that in the case of MALDI-TOF the matrix solution solvent should be identical to, or compatible, that of the sample. If the protein precipitates upon mixing with the matrix solution the experiment will be unsuccessful. There has been little experimentation beyond ESI and MALDI for ionization of membrane proteins. Fast-atom bombardment (FAB) ionization was successful for a small proteolipid [16] and Halgand and coworkers used atmospheric pressure photoionization (APPI) for analysis of a hydrophobic peptide [17]. Since both techniques tend to produce predominantly singly charged ions it is unlikely they will find general favor in membrane protein research. One new development that may hold some promise for the future is a technique called laser-induced liquid bead ionization desorption (LILBID) [18]. In this technique, aqueous microdroplets are gently excited with infra-red laser photons resulting in generation of low charge-state ions, of intact complexes at lowest fluences and intact subunits at higher fluences. LILBID has already been applied to complexes III and IV of the respiratory chain from a bacterial source [18]. One can conceive of combining a technique such as this with an electrospray plume in order to multiply charge the subunits of a desorbed
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
189
complex in a manner analogous to MALDESI [19]. The microdroplets of LILBID are aqueous such that detergent solubilized micelles can be analyzed without complex chromatography separations. Another possibility for membrane protein research is field-induced droplet ionization under investigation by Beauchamp and colleagues [20]. Analogous to ESI, this process also appears to be applicable to solutions containing salts for working with native conditions. Clearly, there is an exciting future with discoveries open to young scientists.
3.1 Dissociation of intact proteins The beauty of tandem mass spectrometry is the highly efficient direct purification one can achieve with the m/z selective filter. Thus a molecule of specific molecular mass can be isolated from a relatively complex mixture with ease, very quickly. While there are only two basic dissociation chemistries available for tandem mass spectrometry, a variety of intact protein gas-phase dissociation strategies provide versatile options for top-down mass spectrometry of IMPs. The first dissociation mechanism available is collision-activated dissociation (CAD) that combines the kinetic energy of the selected precursor ions and collisions with inert gases (Ar and He are common) to thermally (vibrationally) excite the ions. With sufficient energy input, CAD occurs resulting in a backbone cleavage at the peptide bond generating b ions that include the N-terminus and y ions that include the C-terminus of the peptide or protein (see Chapter 1). CAD was first applied to intact proteins by Loo and coworkers shortly after the discovery of ESI [21]. CAD has been used for analysis of membrane protein primary structure on triple quadrupole [9,22], quadrupole TOF [9,23] and, recently, Fourier-transform ion cyclotron resonance (FT-ICR) mass analyzers [24,25]. CAD tends to produce distinct patterns of product ions as some bonds are more easily cleaved than others, so full sequence coverage should not be expected. It should be noted that multiply charged ions fragment much more readily than singly charged ions by CAD so that ESI is the only practical option for top-down mass spectrometry. Furthermore, different charge states of the same molecular ion can require different threshold dissociation energies and may yield different fragmentation patterns. More recently, a second mechanism, electron-capture dissociation (ECD), was discovered [26]. Zubarev and McLafferty describe a non-ergodic mechanism whereby low energy electrons are reacted with multiply charged positive ions in an FT-ICR cell. ECD occurs before the energy of the excited ion equilibrates such that sites of N–Calpha cleavage are largely sequence independent, typically yielding better sequence coverage than CAD. An early observation with ECD was that larger proteins (W20 kDa) tended to exhibit charge reduction rather than fragmentation and it was proposed that tertiary structure would hold the protein together despite cleavage events such that the molecule appeared uncleaved. The use of infra-red irradiation to thermally excite gas phase ions such that they become denatured, concomitant with ECD, alleviates this problem. Such thermal excitation allows for activated ion ECD (aiECD). The level of thermal excitation is adjusted to avoid excessive excitation and backbone cleavage by the CAD mechanism. Infra-red multi-photon dissociation (IRMPD) and black-body infra-red
190
Julian P. Whitelegge
dissociation (BIRD) describe thermal excitation experiments that deliberately cleave the peptide backbone by the CAD mechanism. The FT-ICR cell is ideal for ECD because of the ease of bringing together negative electrons with positive ions. Photons are also easily introduced with the use of an infra-red laser for aiECD and IRMPD. CAD is now usually performed outside of the ICR cell with subsequent transmission of product ions to the cell, using hybrid ion trap or quadrupole FT-ICR systems that allow the cell to be maintained at optimal pressure. Top-down FT-ICR experiments have been performed on bacteriorhodopsin, a seventransmembrane helix IMP, and the c-subunit of the ATP synthase Fo that has two transmembrane helices [24,25]. In the case of the c-subunit ECD alone resulted in charge reduction and it was necessary to use aiECD for efficient dissociation of this 8 kDa protein. The use of aiECD gave better sequence coverage than CAD and yielded extensive sequence information from the transmembrane domains [25]. It was concluded that transmembrane helices are stable in the chromatographic system used and remained intact in the gas phase until thermal excitation. Another development that extends the utility of ECD is the related technique, electron-transfer dissociation (ETD) whereby an anion is used to supply the electron for the dissociation [27]. ETD has been implemented on linear ion trap mass spectrometers and promises to make the technique more widely available than ECD. Since top-down really needs the resolution afforded by FT-ICR, ETD on the linear ion trap is unsatisfactory for larger precursor ions. Implementation of ETD on the Orbitrap analyzer will help in this respect. Typically, top-down mass spectrometry is not done online because the experiments take longer than the chromatographic timescale allows and must be set up manually. There is current excitement that ETD might be more suitable for online top-down but this remains to be demonstrated. Currently, we collect fractions during LC-MS on a low-resolution mass analyzer (LC-MS+) and then use these fractions for top-down experiments. First a full mass range scan is performed to define suitable precursor ions. The most intense ions from some membrane proteins sometimes fall higher than m/z 2,000 dictating use of extended mass range on typical ion trap mass spectrometers. Then a selected ion scan is used to inspect the chosen precursor ion and define conditions for CAD whereby 70–90% of the precursor is dissociated. Finally the product ion spectrum is collected over the full mass range. The sequence coverage achieved in a topdown analysis of the c-subunit benefited immensely from scanning to m/z 3,000 [25]. For proteins up to around 5,000 Da, a single scan may give good sequence coverage but for most top-down work multiple scans are averaged. Scans (or preferably FT transients) are typically averaged until visual inspection confirms good signal to noise on product ions. Even prolonged data collection may not yield full sequence coverage however, and it makes sense to supplement CAD experiments with ECD for practical expansion of coverage.
3.2 Data interpretation Current data processing strategies for interpretation of top-down mass spectrometry datasets are laborious. Firstly, the MSMS data must be converted from m/z to m. It is
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
191
easy to calculate z based upon the 12C/13C isotopomer spacing (1/z) when spectra have been collected on a high-resolution instrument. Unfortunately, for larger ions typical in top-down experiments the most abundant isotopomers contain several 13 C atoms and the monoisotopic peak with all 12C is practically undetectable. Thus the problem arises as to how to assign monoisotopic mass. In practice this is achieved by modeling the theoretical isotopomer distribution of a known atomic formula onto the experimental profile. But of course for an unassigned peak in an MSMS spectrum the atomic formula is not known. In practice, the most abundant isotopomer peak is assumed to be close to the average mass of the molecule and the atomic formula is estimated by dividing the average mass by the mass of the average amino acid residue, known as ‘averagine’. With this artificial atomic formula it is then possible to map the position of the monoisotopic peak onto the experimental data. It is also possible to be off by 1 or 2 13C atoms, especially if the data quality is marginal. The first example of software to perform the m/z to m deconvolution was called THRASH [28] and more recent examples have appeared [29]. Mascot Distiller (Matrix Science) and Xtract (Thermo) are commercially available and Magtran is available as freeware. Once the mass peaklist is obtained the dataset is ready for further analysis, typically extraction of ‘sequence tags’ for protein identification [30]. Sequence tags are derived from mass differences characteristic of amino acid residues in the peaklist and they are independent of the N- or C-terminus of the protein. The short sequence tags extracted from the mass peaklist are then used for a database homology search using software similar to BLAST [31]. The Prosight PTM website (https://prosightptm.scs.uiuc.edu/) has a suite of tools for top-down mass spectrometry data interpretation including extraction and searching of sequence tags [32,33]. Commercial software for topdown analyses is starting to appear with Prosight PC (Thermo Fisher). Once the protein is identified, it is very rare that the mass calculated for the reported sequence will match the mass measured in the experiment, due to deviations in primary structure arising from sequence errors, post-translational modifications and so on. For complete assignment of primary structure from the MSMS dataset a complex manual interpretation phase is necessary to maximize the number of ions assigned to the structure (Figure 4). It is typical to work on adjusting the N- and the C-terminus until sets of ions (b and y, c and z.) start matching the sequence being tested. The overall goal is to assign a sequence that agrees with the measured mass of the parent ion, and matches as many fragments as possible to both N- and C-terminal fragments. Other ions in the MSMS spectrum could arise from water ( 18 Da) or ammonia ( 17 Da) loss, or from internal fragments where multiple dissociation events have occurred. It should be possible to assign all ions in the MSMS spectrum but this is rarely the case in practice. It is unlikely that the ions in the MSMS dataset will provide full sequence coverage across every bond so some reliance upon genomic data is retained. The endpoint in data interpretation is somewhat subjective but hopefully the majority of ions have been assigned and the measured and calculated masses are in full agreement (Figure 5). The typical ion isolation window used for top-down experiments is several dalton wide in order to span the entire isotopomer envelope of the molecular ion.
192
Julian P. Whitelegge
Figure 4 A variety of intact protein gas-phase dissociation strategies provide versatile options for top-down mass spectrometry. The 5+ protonated ion of the ATP synthase c-subunit (AtpH, Arabidopsis thaliana) was subjected to ECD, aiECD and CAD on a hybrid linear ion trap FT-ICR mass spectrometer. Fragment assignments from CAD (b and y ions), ECD, and aiECD (c and z. ions) experiments are mapped to the sequence of AtpH. The c and z. fragments marked by @ symbol were present in both conventional ECD and activated ion ECD spectra. The b and y fragments marked by symbol are present in both CAD and aiECD spectra; and by # symbol – only in aiECD spectra. The c and z. fragments in grey were manually annotated. Transmembrane domains are shaded, demonstrating the improved coverage afforded by aiECD. The numbering on the right-hand side of the figure is reversed for counting y and z. ions. Modified from Zabrouskov and Whitelegge [25]. Permission obtained from American Chemical Society, Washington, DC, 2007.
Narrowing the window typically cuts ion transmission to unacceptable levels, and complicates estimation of the monoisotopic peak. Consequently although we usually consider that we are working with a single isolated ion, it should be remembered that a real protein population often exhibits microheterogeneity such that a mixture of isobaric/isomeric isoforms are included at a particular nominal mass. Thus, interpretation of the MSMS spectrum needs to take this into account. The PRP3 protein from human saliva with a mass of B10,999 Da was recently demonstrated to be a mixture that included isobaric variation (N replacing D in the published sequence) and isomeric variation (D4N versus D50N) [34]. By careful analysis of the high-resolution MSMS spectrum it was possible to conclude that the D4N isoform constituted around 50% of the population while D50N made up around 30%. Such considerations should be noted when reviewing product ion mass accuracy in top-down experiments.
3.3 Future considerations Exciting prospects for top-down mass spectrometry can be conceived if sophisticated data-interpretation algorithms can be brought to bear for data interpretation and eventually integrated with data-dependent acquisition strategies. The original vision of top-down proteomics described by Fred McLafferty and Neil Kelleher [35], will only be realized with the development of software to accelerate the throughput of the technique. Kelleher’s group has pioneered software development in top-down proteomics, and recently described the use of shotgun databases to encompass diverse combinations of posttranslational modifications in histones [36]. The disadvantage of such an approach
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
193
Figure 5 Current data processing strategies are laborious. The subject must first be identified, typically via use of sequence tags. An iterating process of manual sequence assignment then continues until an endpoint is reached – usually agreement of the calculated mass for the assigned structure and the measured mass within experimental error. Review of top-down data requires access to raw data, peaklists derived automatically and/or manually, and the output of the sequence assignment algorithm.
is that the database gets very large because of the need to house all the different structural combinations, as well as the problem that only structural possibilities included in the database are considered in the search. An ideal approach would remain unbiased in order to include previously unknown sequence variants and post-translational modifications [34]. It is likely that bio-informaticians experienced in genomics will contribute to software development in this arena [37]. Although the FT-ICR MS has been the accepted platform for top-down mass spectrometry, instrument development is advancing rapidly. The Makarov analyzer (Orbitrap, Thermo Fisher) [38,39] brings high resolution at a lower cost and will become competitive for smaller proteins [40]. Another point to consider is that resolution on the Orbitrap decreases linearly with increasing m/z while on the FT-ICR it does so with the square of the m/z. Thus there may be advantages to the Orbitrap while working at extended m/z range (W3,000) [24,25]. There is currently speculation that the ETD process is faster than ECD, and thus implementation of ETD on the Orbitrap is eagerly anticipated for top-down proteomics. If quality top-down mass spectra can be generated on the chromatographic timescale there will likely be explosive growth in the field.
194
Julian P. Whitelegge
The development of hybrid mass spectrometer combinations such as the linear ion trap FT-ICR or the linear ion trap Orbitrap has revolutionized the field. Further instrument development that embraces ion-mobility separations as well as the latest ionization technology, as described above, could bring exciting new possibilities to top-down membrane protein mass spectrometry.
REFERENCES 1 C.C. Wu, M.J. MacCoss, K.E. Howell and J.R. Yates, 3rd, A method for the comprehensive proteomic analysis of membrane proteins, Nat. Biotechnol., 21(5) (2003) 532–538. 2 J. Blonder, T.P. Conrads, L.R. Yu, A. Terunuma, G.M. Janini, H.J. Issaq, J.C. Vogel and T.D. Veenstra, A detergent- and cyanogen bromide-free method for integral membrane proteomics: Application to Halobacterium purple membranes and the human epidermal membrane proteome, Proteomics, 4(1) (2004) 31–45. 3 J.P. Whitelegge, S.M. Go´mez and K.F. Faull, Proteomics of membrane proteins. Proteome characterization and proteomics. In: R.D. Smith and T. Veenstra (Eds.), Adv. Protein Chem., 65 (2003) 271–307. 4 S.M. Go´mez, J.N. Nishio, K.F. Faull and J.P. Whitelegge, The chloroplast grana proteome defined by intact mass measurements from LC-MS, Mol. Cell Proteomics, 1 (2002) 45–59. 5 W. Fischle, Y. Wang and C.D. Allis, Binary switches and modification cassettes in histone biology and beyond, Nature, 425(6957) (2003) 475–479. 6 S.M. Go´mez, K.Y. Bil’, R. Aguilera, J.N. Nishio, K.F. Faull and J.P. Whitelegge, Transit peptide cleavage sites of integral thylakoid membrane proteins, Mol. Cell Proteomics, 2 (2003) 1068–1085. 7 J.P. Whitelegge, HPLC and mass spectrometry of intrinsic membrane proteins. In: M.-I. Aguilar (Ed.), Methods in Molecular Biology (Volume 251). HPLC of Peptides and Proteins, Humana Press Inc., Totawa, N.J., 2004, pp. 323–339. 8 J.P. Whitelegge, C. Gundersen and K.F. Faull, Electrospray-ionization mass spectrometry of intact intrinsic membrane proteins, Protein Sci., 7 (1998) 1423–1430. 9 J.P. Whitelegge, R. Aguilera, H. Zhang, R. Taylor and W.A. Cramer, Full subunit coverage liquid chromatography electrospray-ionization mass spectrometry (LCMS+) of an oligomeric membrane protein: Cytochrome b6f complex from Spinach and the cyanobacterium, M. laminosus, Mol. Cell Proteomics, 1 (2002) 816–827. 10 G. Kurisu, H. Zhang, J.L. Smith and W.A. Cramer, Structure of the cytochrome b6f complex of oxygenic photosynthesis: Tuning the cavity, Science, 302(5647) (2003) 1009–1014. 11 D. Stroebel, Y. Choquet, J.L. Popot and D. Picot, An atypical haem in the cytochrome b(6)f complex, Nature, 426(6965) (2003) 413–418. 12 G.E. Tarr and J.W. Crabb, Reverse-phase high-performance liquid chromatography of hydrophobic proteins and fragments thereof, Anal. Biochem., 131(1) (1983) 99–107. 13 J.P. Whitelegge, P. Jewess, M.G. Pickering, C. Gerrish, P. Camilleri and JR Bowyer, Sequence analysis of photoaffinity-labelled peptides derived by proteolysis of photosystem 2 reaction centers from thylakoid membranes treated with (14C)-azidoatrazine, Eur. J. Biochem., 207 (1992) 1077–1084. 14 D. Wessel and U.I. Flugge, A method for the quantitative recovery of protein in dilute solution in the presence of detergents and lipids, Anal. Biochem., 138(1) (1984) 141–143. 15 J. Carroll, I.M. Fearnley and J.E. Walker, Definition of the mitochondrial proteome by measurement of molecular masses of membrane proteins, Proc. Natl. Acad. Sci. USA, 103(44) (2006) 16170–16175. 16 E. Terzi, P. Boyot, A. Van Dorsselaer, B. Luu and E. Trifilieff, Isolation and amino acid sequence of a novel 6.8-kDa mitochondrial proteolipid from beef heart. Use of FAB-MS for molecular mass determination, FEBS Lett., 260(1) (1990) 122–126. 17 A. Delobel, F. Halgand, B. Laffranchise-Gosse, H. Snijders and O. Lapre´vote, Characterization of hydrophobic peptides by atmospheric pressure photoionization-mass spectrometry and tandem mass spectrometry, Anal. Chem., 75(21) (2003) 5961–5968.
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
195
18 N. Morgner, T. Kleinschroth, H.D. Barth, B. Ludwig and B. Brutschy, A Novel approach to analyze membrane proteins by laser mass spectrometry: From protein subunits to the integral complex, J. Am. Soc. Mass Spectrom., 5 (2007). [Epub ahead of print]. 19 J.S. Sampson, A.M. Hawkridge and D.C. Muddiman, Generation and detection of multiplycharged peptides and proteins by matrix-assisted laser desorption electrospray ionization (MALDESI) Fourier transform ion cyclotron resonance mass spectrometry, J. Am. Soc. Mass Spectrom., 17(12) (2006) 1712–1716. 20 R.L. Grimm and J.L. Beauchamp, Dynamics of field-induced droplet ionization: Time-resolved studies of distortion, jetting, and progeny formation from charged and neutral methanol droplets exposed to strong electric fields, J. Phys. Chem. B, 109(16) (2005) 8244–8250. 21 J.A. Loo, H.R. Udseth and R.D. Smith, Peptide and protein analysis by electrospray ionizationmass spectrometry and capillary electrophoresis-mass spectrometry, Anal. Biochem., 179(2) (1989) 404–412. 22 I.M. Fearnley and J.E. Walker, Analysis of hydrophobic proteins and peptides by electrospray ionization MS, Biochem. Soc. Trans., 24(3) (1996) 912–917. 23 J.P. Whitelegge, Tandem mass spectrometry of integral membrane proteins for top-down proteomics, Trends Anal. Chem., 24 (2005) 576–582. 24 J.P. Whitelegge, F. Halgand, P. Souda and V. Zabrouskov, Top-down mass spectrometry of integral membrane proteins, Expert Rev. Proteomics, 3(6) (2006) 585–596. 25 V. Zabrouskov and J.P. Whitelegge, Increased coverage in the transmembrane domain with activated-ion electron capture dissociation for top-down fourier-transform mass spectrometry of integral membrane proteins, J. Proteome Res., 6(6) (2007) 2205–2210. 26 R.A. Zubarev, N.L. Kelleher and F.W. McLafferty, Electron capture dissociation of multiply charged protein cations. A nonergodic process, J. Am. Chem. Soc., 120(13) (1998) 3265–3266. 27 J.E. Syka, J.J. Coon, M.J. Schroeder, J. Shabanowitz and D.F. Hunt, Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry, Proc. Natl. Acad. Sci. USA, 101 (2004) 9528–9533. 28 D.M. Horn, R.A. Zubarev and F.W. McLafferty, Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules, J. Am. Soc. Mass Spectrom., 11(4) (2000) 320. 29 L. Chen, S.K. Sze and H. Yang, Automated intensity descent algorithm for interpretation of complex high-resolution mass spectra, Anal. Chem., 78(14) (2006) 5006–5018. 30 E. Mørtz, P.B. O’Connor, P. Roepstorff, N.L. Kelleher, T.D. Wood, F.W. McLafferty and M. Mann, Sequence tag identification of intact proteins by matching tandem mass spectral data against sequence databases, Proc. Natl. Acad. Sci. USA, 93 (1996) 8264–8267. 31 S.F. Altschul, W. Gish, W. Miller, E.W. Myers and D.J. Lipman, Basic local alignment search tool, J. Mol. Biol., 215 (1990) 403–410. 32 G.K. Taylor, Y.B. Kim, A.J. Forbes, F. Meng, R. McCarthy and N.L. Kelleher, Web and database software for identification of intact proteins using ‘‘top down’’ mass spectrometry, Anal. Chem., 75(16) (2003) 4081–4086. 33 R.D. LeDuc, G.K. Taylor, Y.B. Kim, T.E. Januszyk, L.H. Bynum, J.V. Sola, J.S. Garavelli and N.L. Kelleher, ProSight PTM: An integrated environment for protein identification and characterization by top-down mass spectrometry, Nucleic Acids Res., 32(Web Server issue) (2004) W340–W345. 34 J.P. Whitelegge, V. Zabrouskov, F. Halgand, P. Souda, S. Bassilian, W. Yan, L. Wolinsky, J.A. Loo, D.T. Wong and K.F. Faull, Protein-sequence polymorphisms and post-translational modifications in proteins from human saliva using top-down Foruier-transform ion cyclotron resonance mass spectrometry. Int. J. Mass Spectrom., 268 (2007) 190–197. 35 N.L. Kelleher, H.Y. Lin, G.A. Valaskovic, D.J. Aaserud, E.K. Fridriksson and F.W. McLafferty, Top down versus bottom up protein characterization by tandem high-resolution mass spectrometry, J. Am. Chem. Soc., 121 (1999) 806–807. 36 J.J. Pesavento, Y.B. Kim, G.K. Taylor and N.L. Kelleher, Shotgun annotation of histone modifications: A new approach for streamlined characterization of proteins by top down mass spectrometry, J. Am. Chem. Soc., 126(11) (2004) 3386–3387.
196
Julian P. Whitelegge
37 D. Tsur, S. Tanner, E. Zandi, V. Bafna and P.A. Pevzner, Identification of post-translational modifications by blind search of mass spectra, Nat. Biotechnol., 23(12) (2005) 1562–1567. 38 A. Makarov, Electrostatic axially harmonic orbital trapping: A high-performance technique of mass analysis, Anal. Chem., 72 (2000) 1156. 39 Q. Hu, R.J. Noll, H. Li, A. Makarov, M. Hardman and G. Cooks, The Orbitrap: A new mass spectrometer, J. Mass Spectrom., 40(4) (2005) 430–443. 40 B. Macek, L.F. Waanders, J.V. Olsen and M. Mann, Top-down protein sequencing and MS3 on a hybrid linear quadrupole ion trap-orbitrap mass spectrometer, Mol. Cell Proteomics, 5(5) (2006) 949–958.