Analytica Chimica Acta 566 (2006) 147–156
Improvements in protein identification confidence and proteome coverage for human liver proteome study by coupling a parallel mass spectrometry/mass spectrometry analysis with multi-dimensional chromatography separation Jie Zhang a , Mingxia Gao a , Jia Tang a , Pengyuan Yang a , Yinkun Liu b , Xiangmin Zhang a,∗ a
Department of Chemistry & Research Center of Proteome Fudan University, Shanghai 200433, China b Zhongshan Hospital, Fudan University, Shanghai 200232, China Received 26 November 2005; received in revised form 3 March 2006; accepted 6 March 2006 Available online 22 March 2006
Abstract Recently, matrix-assisted laser desorption ionization (MALDI) technique has been shown to be complementary to electrospray ionization (ESI) with respect to the population of peptides and proteins that can be detected. In this study, we tried to hyphenate MALDI–TOF–TOF–MS and ESI–QUADRUPOLE–TOF–MS with a single 2D liquid chromatography for complicated protein sample analysis. The effluents of RPLC were split into two parts for the parallel MS/MS detection. After optimizing the operation conditions in LC separation and MS identification, a total of 1149 proteins were identified from the global lysate of normal human liver (NHL) tissue. Compared to the single MS/MS detection, the combined analysis increased the number of proteins identified (more than 25%) and enhanced the protein identification confidence. Proteins identified were categorized and analyzed based upon their cellular location, biological process and molecular function. The identification results demonstrated the application potential of a parallel MS/MS analysis coupled with multi-dimensional LC separation for complicated protein sample identification, especially for proteome analysis, such as human tissues or cells extracts. © 2006 Elsevier B.V. All rights reserved. Keywords: 2D LC; Parallel MS/MS analysis; Human liver proteome
1. Introduction Proteomics aims at the global analysis of tissue and cellular proteins, and uses a combination of techniques including two-dimensional gel electrophoresis (2DE) or liquid chromatography, image analysis, mass spectrometry, and bio-informatics to resolve comprehensively, to quantify, and to characterize proteins. As a complement to 2DE, multi-dimensional chromatography has drawn more and more attention in the field of proteome research. In addition to providing far greater resolving power for a diverse set of proteins, multi-dimensional chromatography approaches also offer versatility, automation, sensitivity, reproducibility and high throughput [1,2]. Yates and coworkers describe an automated method for shotgun proteomics named multi-dimensional protein identification technology (MudPIT) [3,4]. MudPIT integrates a strong cation-exchange (SCX) resin
∗
Corresponding author. Tel.: +86 21 5566 4165; fax: +86 21 6564 1740. E-mail address:
[email protected] (X. Zhang).
0003-2670/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.aca.2006.03.045
and reversed-phase resin in a biphasic column and has been successfully applied for the analysis of protein expression profiles. An on-line nano-flow 2D liquid chromatography system for proteome analysis was also developed in our lab [5]. In this 2D LC system, SCX separation and RPLC separation were performed in two different columns. Protein digests were separated firstly according to their charge by SCX, then the peptides on SCX column were displaced by step-gradient salts and the fractions were trapped on a precolumn, after that the precolumn was switched into the flow path of RPLC, the following separation was performed in RPLC column based on peptides hydrophobicity. Mass spectrometry is widely used in protein identification, and it commonly uses electrospray ionization (ESI) or matrixassisted laser desorption ionization (MALDI) for sample introduction. ESI–MS/MS is the general option for hyphenating with LC to perform on-line protein identification, while MALDI–MS is often coupled with gel method. Compared to ESI, MALDI is superior in detection sensitivity due to its remarkable tolerance to buffers, salts and additives in samples. With the obvious improvements in laser shot frequency and automation of spectra
148
J. Zhang et al. / Analytica Chimica Acta 566 (2006) 147–156
scanning, some research groups have tried the hyphenation of MALDI–MS to 1D LC for protein identification [6–8], but the work on hyphenating MALDI–MS with 2D LC especially for complicated protein sample analysis has never been reported. ESI and MALDI are two different ionization techniques and they can provide both analogous and complementary information on the same sample [9]. As we know, the overall extracts of mammalian cells or tissues contained a diverse set of proteins varying from hydrophilic to hydrophobic (including integral membrane proteins), from highly acidic to neutral to highly basic, and from high abundance to low abundance. To effectively resolve the complicated protein, it is necessary to apply 2D liquid chromatography in separation. Furthermore, to obtain a more comprehensive analysis result from the overall protein extracts, a parallel identification combining ESI–MS/MS and MALDI–MS/MS can be coupled with 2D liquid chromatography for protein identification. The combination of 2D LC system with parallel MALDI–MS/MS and ESI–MS/MS for mammal proteome research is first reported in our work. With the rapid development in proteomics technologies, the proteomics approach is being actively applied to the molecular analysis of various human cancers such as liver [10], bladder [11], colorectal [12,13], breast [14] and stomach [15]. Liver cancer is one of the leading causes of cancer deaths worldwide. Each year, 100,000 cancer deaths caused by hepatocellular carcinoma (HCC) in China, account for 45% of all HCC cancer deaths worldwide. Although incidence rates in western world are lower than in Asia, HCC is still a significant worldwide health burden. To establish suitable databases of identifiable proteins in homogenesis HCC samples, HCC metastasis samples and normal human liver samples and to make a comprehensive comparison of these databases can help us apply proteomic analyses to pathological perturbations and eventually find more disease related pathways and key molecules. That is why China Human Liver Proteome Project (CHLPP) has been initiated. As a participant lab in CHLPP, we employed 2D LC–parallel ESI–MS/MS and MALDI–MS/MS strategy for profiling of expressed proteins in normal human liver (NHL) tissue. In this study, two mass spectrometers, QSTARXL (ESI– QUADRUPOLE–TOF–MS) and 4700 PROTEOMICS ANALYZER, a powerful MALDI–TOF–TOF–MS with the capacity of analyzing 3800 spots per day, were selected as detectors for protein identification. The identification results demonstrated that the parallel MS/MS analysis has greatly improved the proteome coverage and the protein identification confidence for NHL sample compared to the single ESI- (or MALDI-) MS/MS analysis. 2. Experimental 2.1. Chemicals HPLC grade acetonitrile (ACN) was from Fisher Scientific (Fairlawn, NJ). The water used was MilliQ grade (Millipore, Bedford, MA, USA). HPLC grade formic acid (FA), trifluoroacetic acid (TFA), urea, phenylmethylsulfonyl fluoride (PMSF), sequencing grade trypsin, dithiothreitol (DTT), iodoac-
etamide (IAA) were obtained from Sigma (St. Louis, MO). ␣Cyano-4-hydroxycinnamic acid, i.e. CHCA, was from Aldrich (Milwaukee, WI). All chemicals used in buffer solution were analytical grade. 2.2. Protein preparation and digestion The normal human liver tissue specimens were offered by Liver Cancer Institute of Zhongshan Hospital, Fudan University. All manipulations in collection of liver specimens were approved by the Institutional Bioethics Committee and informed consent was given by all participants. The liver tissue specimens were diced and washed with cold physiological saline solution (0.9% NaCl solution) to remove blood and other possible contaminants. The one-step protein extraction procedure was as follows: specimens were homogenized in lysis buffer containing 8 M urea and a mixture of protease inhibitors and phosphatase inhibitors (Cocktail tablet, 0.2 mM Na2 VO3 and 1 mM NaF) with a sample/buffer ratio of 1/8 (w/v), the extraction was aided by vortex of 30 min. All the extraction operations were performed at 0 ◦ C. Then the suspension was centrifuged at 18,000 × g for 1 h (4 ◦ C), and the supernatant containing the total liver proteins was collected. The extracted protein concentration was measured by a Bio-rad assay using bovine serum albumin (BSA) as standard. Before separation, the proteins were reduced with 10 mM DTT at 37 ◦ C for 1 h and then alkylated with 25 mM IAA for an additional 30 min at room temperature in the dark. After diluting the urea to 2 M with 50 mM ammonium bicarbonate (ph 8.5), a tryptic digestion was performed overnight at 37 ◦ C. 2.3. Comprehensive micro-SCX–RPLC–MS/MS system Peptide separations were performed with an UltiMateTM micro-scale LC system combined with a FAMOS microautosampler and a SWITCHOS valves from LC Packings (Amsterdam, The Netherlands). A FAMOS micro-autosampler with an additional built-in six-port valve was used for sample injection. Elution of strong cation-exchange chromatography (SCX), sample preconcentration and desalting on the trapcolumn was performed with an auxiliary LC quaternary pump building in SWITCHOS that was operated isocratically at a flow rate of 40 L/min. Before separation, the SCX column was equilibrated with buffer consisting of 5% acetonitrile (v/v) and 0.1% formic acid (v/v). Then, 40 L trypsin digested extracts were injected to SCX column (Bio-SCX, 1 mm i.d. × 150 mm, LC packings). The flowthrough peptides were captured on a cartridge type trap column (C18 PepMapTM , 300 m i.d. × 5 mm, LC Packings), the trap column was desalted by sample loading buffer, and then switched on-line with RPLC column for further separation. The following cycle of displacing and separating SCX fractions was similar to the procedure described above. A step gradient of ammonium acetate was used for eluting the peptides from SCX column. In each cycle, 200 L of ammonium acetate was injected to SCX column for peptides elution. A total of 12 elution cycles were performed. The elution buffer concentration was as follows: 5, 10, 20, 40, 80, 160, 320, 640, 800, 1000, 1500 and 2000 mM.
J. Zhang et al. / Analytica Chimica Acta 566 (2006) 147–156
RPLC analysis was performed with a home-made reversed˚ C8 phase column (300 mm × 250 m, packed with 5 m, 300 A particles, Zorbax 300SB stationary). A flow rate of 2 L/min was used for separation. Gradient elution was performed with solvent A: 5% acetonitrile (v/v) + 0.1% formic acid (v/v) and solvent B: 95% acetonitrile (v/v) + 0.1% formic acid (v/v). The gradient was as follows: firstly, an isocratic elution with 0% B lasted for 18 min, then ramped to 10% B within 6 min, followed by a linear increase to 50% B at 104 min and an abrupt increase to 95% B within 0.1 min. The 95% B was maintained for 5 min, and then ramped down to 0% for column equilibrium. A postcolumn splitter was configured in the 2D system to split the flow into two parts: 200 nL/min to the nano-electrospray source of QSTARXL and 1.8 L/min to Probot, an automated spotter (Dionex, Sunnyvale, CA). Through the spotter the LC effluents in capillary column were mixed with matrix solution by a coaxial stainless steel needle prior to deposition on MALDI target plates.
149
cot server via custom software. That custom software combined the peak lists generated from the SCX fractions analyzed separately into one peak list prior to submission to Mascot server. The parameters for searching are as follows: peptide mass tolerance, 0.09 Da; peptide fragment ion mass tolerance, 0.12 Da. One missed cleavage was allowed, fixed modification was cysteine carboxyamidomethylation and variable modification was methionine oxidation. Only significant hits as defined by MASCOT probability analysis were considered initially. In addition, at least one peptide match with ion score above 25 was set as the threshold for acceptance [8,16]. 2.6. Classification of identified protein To annotate and classify the functions and locations of identified proteins, three main gene ontology (GO) categories: biological process, cellular location and molecular function were used to map identified proteins.
2.4. Mass spectrometry 3. Results and discussion The QSTARXL (Applied Biosystems, USA) was used to acquire LC–ESI/MS/MS spectra. The instrument was operated in positive ion mode with a 2.5 kV electrospray voltage. An analysis cycle includes 1 s of MS spectrum collection and 9 s of MS/MS scans of three precursor ions. The initial MS scanning utilized an m/z range of 400–2000. Precursor ions selection was based upon ion intensity (above 10 counts), charge state (+2, +3), and if the precursor had been previously selected for interrogation (dynamic exclusion). The instrument was calibrated prior to analysis using horse myoglobin digest peptides. 2D-LC–MALDI–MS/MS analysis was performed on 4700 Proteomics Analyzer, a MALDI–TOF–TOF mass spectrometer with delayed ion extraction (Applied Biosystems, USA). A laser with 337 nm wavelength was used as desorption ionization source. The maximal shot frequency was 200 Hz. The spectra scanning was performed in reflector-positive mode with an acceleration voltage of 15 kV. The initial MS scanning utilized an m/z range of 900–3000. The TOF–TOF mass spectra were acquired by the result dependant acquisition method with 15 precursor ions selected from one previous MS scan. The list of candidate ions for MS/MS scanning was ordered based on decreasing intensity and compared against exclude and adduct lists prior to generation of the final “precursor List” for automated MS/MS scanning. The m/z values of peptides produced by trypsin autolysis were input into the exclude list. The mass calibration was done externally on the target using a horse myglobin digest peptides. 2.5. Data processing and database searching Data obtained from the TOF/TOF and QSTARXL were searched using Mascot (version 1.6, Matrix Sciences, London, UK) as the search engine. All searchings were performed against nonredundant IPI Human database with about 39,000 entries (version 2.3.3). Data from TOF/TOF were submitted by GPS Explorer (version 1.0, Applied Biosystems, USA) for database searching. Data from QSTARXL were submitted to the Mas-
3.1. Evaluating and optimization of the micro-SCX–RPLC system In the comprehensive micro-scale 2D system, 1 mm × 150 mm SCX column and 250 m × 300 mm RPLC column were employed for separation. To choose the micro-flow 2D system was based on the premise that such a system could provide sufficient sample for both MALDI–MS and ESI–MS detection. In MALDI method, the MS/MS spectra quality, in terms of signal intensity and S/N value, could be improved by increased laser intensity, more laser shots per subspectrum and more accumulative subspectra contributing to a single spectrum. Such increase required sufficient sample on MALDI target plates. Compared to nano-SCX–RPLC system, the micro-system can load and separate more sample for further MALDI–MS/MS analysis. The resolution of micro-LC is not as good as that of nano-LC. In order to compensate the resolution loss caused by micro-LC, a long RP column has been used for separation. Theoretically, the LC resolution is proportional to the length of the column. The longer the column is, the better the resolution is. Restricted by the pressure tolerance of the ultimate pump, a 300 mm-length column has been used. The flow rate of the sample loading buffer and the sample loading time are two key parameters, which contributed to the quality of successful 2D HPLC. The flow rate is set according to the inside diameter of the SCX column, volume and the type of connecting tube. The loading time depends on sample volume and the loading column capacity. Too short a loading time can result in insufficient sample loading. The probability that the sample is washed out from the trap column using a long loading time is high. Therefore, this parameter has to be considered and optimized carefully. In our experiment, the flow rate of the loading buffer was optimized at 40 L/min and the sample loading time was 6 min. After setting the sample loading time and the flow rate of the Switchos loading pump, the time for washing trap column has to be considered. The washing of the salt or
150
J. Zhang et al. / Analytica Chimica Acta 566 (2006) 147–156
sample matrix from the RP trap column is a key step in the 2D LC separation. If the salt is not washed properly, it will interfere with the MS signal and lower the overall sensitivity. An injection of 200 L 2000 mM ammonium acetate to SCX column was performed by Switchos loading pump, after 6 min sample loading, the SCX column was switched off-line and the precolumn desalting process was monitored by UV detector. Within another 6 min, the salt was swept out of the trap column completely. So the time of washing was set as 6 min. Following the washing step, the RP trap column can be switched on-line with the micro-separation column to perform the analytical separation. Determined by the sample amount and the inside diameter of separation column, a 300 m i.d. × 5 mm C18 PepMapTM was selected as the RP trapcolumn. It is important to notice that we performed a back flush elution from the trap column. This enabled sample focusing on the head of separation column and reduced peak broadening caused by directly eluting SCX peptides to LC column.
At the outlet of RP column, the LC eluant is split into two parts for MS analysis. A splitter with 3 nL dead volume is utilized to realize split. The inside diameter of the tubes connecting the separation system to mass spectrometer is 10 m. The nanoliter scale dead volume of the splitter and the 10 m i.d. of the connecting tubes was used for reducing the post-column resolution loss. 3.2. Consideration of conditions in sample deposition on MALDI target plate CHCA is the most widely used MALDI matrix for peptide analysis [17–19]. It yields strong matrix signals for many peptides, but the clusters it forms will obscure spectra interpretation in certain mass range. One practical solution found to date is to adjust the matrix solution composition to minimize cluster formation. In our experiment, 0.7 mg/mL ammonia citrate was added into the matrix solution to suppress CHCA signal.
Fig. 1. (A) 2D contour mapping and 3D plot of one SCX fraction, which eluted by 20 mM NH4 AC, then separated by cRPLC and detected by MALDI-MS. The pink arrow labeled peptide with m/z = 2113.31 Da (1+), came from tryptic digests of 60 kDa heat shock protein. (B) The MS/MS spectrum for precursor ion of 2113.31 (1+), sequence identified as ALMLQGVDLLADAVAVTMGPK. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of the article.)
J. Zhang et al. / Analytica Chimica Acta 566 (2006) 147–156
Different concentration of organic solvent in matrix solution affects the crystal formation and distribution, and the crystal morphology affects peptide ionization. In our experiment, crystals formed at a higher ACN concentration, i.e. 60% ACN, were more uniform and with the smaller gaps between crystals than those formed from 50% ACN, which could lead to a more efficient sampling of the spot by laser. So the 60% ACN solvent system was adopted for the subsequent experiments. The spotting frequency was set according to the average peak width in LC separation and the volume of the effluents deposited within each spot. The average peak width for fairly resolved peaks by LC was about 0.8 min, so the interval of spotting should be shorter than 50 s for retaining LC separation resolution. Given that MALDI target plate used in the experiment was blank plate where there were no wells in it for sample holding, the spot volume should be as small as possible to avoid dilution caused by effluents spreading out on the plate surface. Generally speaking, the more the spots deposited in one cRPLC run, the more the MS/MS scanning can be performed and the more the peptides can be identified. However, tiny fraction droplet is not easy to be deposited on the MALDI target, so a compromise should be made between the spotting frequency and the spot size. According to our experience, a spotting frequency of 5 Hz could generate an appropriate size of droplet, which was easy to be deposited on target plate and also contained enough sample in each well for MS/MS spectra acquisition. 3.3. 2D-LC–parallel MS/MS analysis for tryptic digests of proteins extracted from normal human liver tissue A total of 250 g tryptic digests of proteins extracted from normal human liver tissue were separated and analyzed by the 2D-LC–parallel MS/MS system. During 26 h separation, there were about 26,400 MS/MS spectra generated. Based on the parallel analysis of MALDI–TOF–TOF–MS and ESI–QUADRUPOLE–TOF–MS, a total of 3432 unique peptides and 1149 proteins were identified. Fig. 1A shows the 2D contour mapping and 3D plot of peptide fraction eluted by 20 mM NH4 AC from SCX, which was separated by cRPLC and detected by MALDI–MS/MS. The peak labeled with pink arrow represents the presursor ion with m/z = 2113.31, and its corresponding MS/MS spectrum is shown in Fig. 1B, from which a peptides sequence of ALMLQGVDLLADAVAVTMGPK could be read out and this peptide contributes to the identification of 60 kDa heat shock protein. Fig. 2A is the total ion chromatogram (TIC) of 40 mM NH4 AC fraction detected by ESI–MS/MS. The MS spectrum of LC effluent at 52.49 min is showed in Fig. 2B. The corresponding MS/MS spectrum of the asterisk labeled peptide (m/z = 854.14) is displayed in Fig. 2C, from which a peptide sequence of LVQDVANNTNEEAGDGTTTATVLAR can be read out, and this peptide contributes to the identification of 60 kDa heat shock protein. Among the identified proteins, the largest isolated protein is titin isoform N2-A, with a Mr of 3813779.5. The smallest is putative metallothionein C20orf127, with a Mr of 5755. 72.1% of the identified proteins has the mass between 20,000 and 100,000 Da, 10.0% has the mass below
151
20,000 Da, and 17.9% has the mass greater than 100,000 Da. The pI distribution of all identified proteins is between 4.14 and 12.26. 13.1% proteins are highly basic with pI above 9. Fig. 3 shows pI and molecular weight distribution of the identified proteins. For proteins with extreme molecular weight (<20,000 or >100,000) or extreme pI value (especially basic proteins), it is difficult to map them in 2D-GEL method, while these proteins can be identified in our work. From this point of view, multi-dimensional liquid chromatography coupled with tandem mass spectrometry is a good complement to 2D gel. 3.4. Categorization and description of detected proteins Categorization of the function and location of the identified proteins was performed with the assistance of gene ontology identification numbers (GO ID #s) downloaded from EBI at http://www.ebi.ac.uk. Cellular location, molecular function and biological process categorization of the detected proteins were based upon GO ID #s. The results shown in Fig. 4 represent 70% of the totally detected proteins. For the cellular location of the detected proteins illustrated in Fig. 4A, the top three categorizations fall in cytoplasm (40%), membrane (19%) and nucleus (9%). Here, the “membrane” category includes membrane, integral to membrane, inner membrane and plasma membrane. There are about 9% proteins belonging to the intracellular category and they cannot be further annotated the definite location. The high presentation (12%) of mitochondrion proteins in cytoplasm category probably stems from both the better solubility of these proteins (and hence, the greater ability to digest and detect their peptides) as well as their overall higher abundance in the liver tissue. The low-presented categories are believed to be affected by solubility conditions and lower abundance of these proteins (such as Golgi apparatus proteins) for there were no efforts made to target any specific subset of proteins. In Fig. 4B, it can be seen that the biggest part of the identified proteins (43%) are with catalytic function and the second largest part (about 40%) are those with binding function. Proteins with different molecular function participate in diversified biological processes. Fig. 4C graphically illustrates the biological process in which the detected proteins are involved. In accordance with the fact that liver functions as an important metabolism organ in human body, the majority (55%) of the identified proteins participate in energy and substances metabolism process. The number of proteins involved in the metabolism of amino acid and derivative, electron transfer and protein metabolism ranked at the first three places in the metabolism category. Sarcosine dehydrogenase (SARDH) constitutes one of the major folate binding proteins in liver [20,21]. Disulfide isomerase family and peptidyl-prolyl cis–trans isomerase are enzymes involved in protein structure construction [22]; pyruvate dehydrogenase and malate dehydrogenase are involved in pyruvate oxidation and citric acid cycle [22], pyruvate kinase, alpha enolase, phosphoglycerate kinase are enzymes involved in glycolysis pathway [22], cathepsin D precursor, proteasome and
152
J. Zhang et al. / Analytica Chimica Acta 566 (2006) 147–156
Fig. 2. (A) The TIC of RPLC separation for 40 mM NH4 AC fraction detected in ESI-based method. (B) The MS spectrum of peptides eluted at 52.49 min. (C) The MS/MS spectrum for precursor ion of 854.14 (3+), labeled with asterisks in Fig. 2B, which sequence was identified as LVQDVANNTNEEAGDGTTTATVLAR, also contributing to identification of 60 kDa heat shock protein.
ubiquitin carboxyl terminal hydrolase are enzymes involved in intracellular protein breakdown and protein turnover; galactokinase is galactose catabolism associated [22]. Table 1 shows the identified proteins involved in the electron transport and oxidative phosphorylation occurring in mitochondria, among which 10 proteins are identified in both ESI-based and MALDI-based analysis routes, 4 proteins are identified in ESIbased analysis and 2 proteins are identified in MALDI-based analysis.
Detoxification is a unique function of liver organ. Twelve detoxification-related proteins are identified in our experiment. They mainly belong to two enzyme families: the glutathione-Stransferase (GST) family and the aldo–keto reductase (AKR) superfamily. GSTs are multifunctional proteins. They share common structural and functional properties and catalyse coupling of glutathione to electrophils, carcinogens, and endogenous compounds [23]. The AKR superfamily includes several main enzyme families. The largest family (AKR1)
J. Zhang et al. / Analytica Chimica Acta 566 (2006) 147–156
153
3.5. Comparison of ESI–MS/MS and MALDI–MS/MS identification
Fig. 3. Distribution of PI and MW of the identified proteins. Each triangle in the figure represents a unique protein. The square root of MW (MW1/2 ) was used as Y-axis for if MW was used as Y-axis the triangles in this plot would converge together. In addition, to show the MW distribution more clearly, the protein with the highest molecular weight was removed from the plot.
contains the aldehyde reductases (AKR1A), the aldose reductases (AKR1B), which are detected in our experiment. In addition to regulating the concentration of several membrane lipids [24], liver carboxylesterase also plays a role in the detoxification of foreign compounds by adding ester or amino bonds [25,26].
A comparison of MS scans on the same LC effluent has been made. As expected, the multiply charged ions are dominant part in ions produced by ESI while the singly charged ions are dominant part in ions produced by MALDI. A comparison of ESI–MS/MS and MALDI–MS/MS scans of the same peptides has been made. Some differences could be observed in the fragmentation patterns between the multiply charged ions of ESI and singly charged ions of MALDI. The multiply charged peptides produced by ESI tended to produce cleaner spectra with less internal fragments, while in MALDI-based method, the fragmentation of peptides yielded more amounts of y-, b-, and a-ions accompanied by ions resulting from neutral loss of ammonia or water, that is to say, more structural information from MS/MS spectra generated by MALDI can be obtained. The fragment ions intensity generated by MALDI–MS/MS was higher than that in ESI–MS/MS, which could be attributed in part to differences in the ionization processes and in part to the different MS/MS acquisition criteria. One principal feature of using MALDI-based method for the analysis of LC-separated samples is that the duty-cycle of the experiment is essentially unlimited, owing to the fact that the chromatographic separation is preserved almost indefinitely in the dried spots on the MALDI target. Consequently, there is a nearly unlimited amount of time available to make decisions about the signals to select for MS/MS. Furthermore; the decision process may be made after the MS analysis of the entire
Fig. 4. Cellular categorization of identified proteins. Categorization was achieved by correlating GO identification numbers corresponding to cellular component, molecular function and biological process with the identified proteins. Values in figures presented the ratio distribution of proteins found in that respective category: (A) identified proteins categorized based upon their cellular component; (B) identified proteins categorized based upon their molecular function; (C) identified proteins categorized based upon their biological roles.
154
J. Zhang et al. / Analytica Chimica Acta 566 (2006) 147–156
Table 1 Proteins within the mitochondrial electron transport and oxidative phosphorylation complexes identified in the global lysate of normal human liver tissue using 2D-LC–parallel-MS/MS Complex
Protein accession no. in IPI
Protein name
Sequence coverage (%)
Complex I NADH dehydrogenase Complex II succinate reductase
IPI00291328 IPI00010810 IPI00004902 IPI00026964 IPI00305383 IPI00216085 IPI00025086 IPI00021785 IPI00219291 IPI00024920 IPI00219567 IPI00220300 IPI00002521 IPI00016638 IPI00303476
NADH–ubiquinone oxidoreductase 24 kDa subunit Electron transfer flavoprotein Electron transfer flavoprotein beta-subunit Ubiquinol–cytochrome c reductase iron–sulfur subunit Ubiquinol–cytochrome c reductase complex core protein 2 Cytochrome c oxidase subunit VIb Cytochrome c oxidase polypeptide Va Cytochrome c oxidase polypeptide Vb ATP synthase f chain ATP synthase delta chain ATP synthase gamma chain ATP synthase, H+ transporting ATP synthase coupling factor 6 ATP synthase alpha chain ATP synthase beta chain
4.0 10.2 12.7 5.1 3.7 27.9 6.3 9.2 14.7 8.2 4.0 13.0 29.4 9.4 23.0
IPI00215916
Cytochrome c
10.4
Complex III cytochrome c reductase Complex IV cytochrome c oxidase
Complex V ATP synthase
Cytochrome c
separation is complete. This last property makes it possible to obtain MS/MS spectra for peptides at the chromatographic peak maxima to ensure the acquisition of the best possible spectra for a given amount of sample. This mode of acquisition differs from an on-line ESI–LC/MS/MS experiment in which MS/MS spectra are acquired as soon as a signal meets the minimum threshold conditions, because the data system has no way of knowing whether the signal will continue to increase in time or decrease. In some cases, the intensity difference of precursor ions made the quality of MS/MS spectra in MALDI better than that in ESI. Consequently, the peptides and proteins identified based on MALDI–MS/MS spectra can obtain increased Mascot scores and greater confidence. Take hypothetical protein IPI00334775 for example, a poorly identified sequence of NPDDITQEEYGEFYK (MOWSE score: 29) was read out in ESI method, while in MALDI method, the same peptide got reconfirmed by MOWSE score of 47. Similarly, the other poorly convincing sequence of GVVDSEDLPLNISR (MOWSE score: 25) in ESI–MS/MS was confirmed with a higher confidence (MOWSE score: 68) in MALDI–MS/MS. The average length of peptides identified in MALDI–MS/MS was 16 amino acids, while in ESI–MS/MS was 14 amino acids, which meant MALDI–MS/MS obtained better success rates on bigger peptides identification. In terms of protein identified, we have compared the hydrophobicity of proteins identified in ESI and MALDI. The hydrophobicity is indicated by the grand average hydrophobicity (GRAVY) values determined according to Kyte and Doolittle [27]. GRAVY values usually vary in the range ±2.0. Positive scores indicate hydrophobicity and negative scores indicate hydrophilicity. There was no obvious difference in the hydrophobicity of proteins identified in two different MS/MS. The distribution patterns of GRAVY values in two MS/MS were similar to each other. The rate of positive GRAVY values in ESI–MS/MS and MALDI–MS/MS were 13.2% and 12.7%, respectively. The rate of GRAVY values between −0.5 and 0 in ESI–MS/MS and MALDI–MS/MS were 63.9% and 68.9%.
3.6. Benefits of the parallel analysis The ionization mechanisms of ESI and MALDI are fundamentally different, in addition, the data acquisition criteria and the fragmentation process in ESI–MS/MS and MALDI–MS/MS scans are also different. These differences make the sets of tryptic peptides detectable by the two approaches not identical [9]. Compared to the single MS/MS, the parallel analysis strategy takes advantage of the complementary nature of ESI and MALDI and provides more effective identification when coupled with 2D LC for proteome analysis. There are two main benefits of parallel analysis observed. One is the increased proteome coverage. While there was a significant overlap (42.4%) observed between the proteins identified from the two sets of peptides, a considerable number of unique proteins were observed by either LC/ESI/MS/MS (32.1%) or LC/MALDI/MS/MS (25.5%) (Fig. 5). This finding indicates that improved proteome coverage is obtained using a combination of the two techniques, and is in agreement with other reports (D.H. Patterson, M. Vestal, S. Martin, personal communication, 2002).
Fig. 5. Comparison of the number of overlapping and unique protein assignments obtained by ESI/MS/MS and MALDI/MS/MS analysis of the global lysate extracted from normal human liver tissue specimens.
J. Zhang et al. / Analytica Chimica Acta 566 (2006) 147–156
The other benefit is the improved identification confidence. This can be demonstrated by the elevated ion scores, decreased rate of single peptide hits and improved peptide coverage. For the peptides identified in both ESI and MALDI, the differences in ionization processes, fragmentation patterns and fragment ion intensity lead to the difference in the quality of MS/MS spectra. To select better MS/MS spectra for database searching can match peptide at a higher confidence. The direct reflection was elevated ion score. As for the decreased rate of one peptide hits, according to Wei’s and Martens’s work [28,29], the one peptide hits in shotgun method, especially in ESI–MS/MS, constituted more than 50% of the identification results. In our ESI–MS/MS analysis result, 50.1% proteins were one peptide assignment proteins, i.e. 429 proteins were identified with one peptide. Although these one peptide hits were identified according to the widely used criterion in MudPIT, they were not confident as those identified with two or more peptides. Fortunately, with the identification of additional unique peptides in MALDI, 113 (13.3%) ESI-one peptide matching proteins got reconfirmed by at least two matching peptides in parallel analysis. The rate of single peptide hits in the parallel analysis result was decreased to 44.6% from 50.1% in the single ESI–MS/MS. Such decrease demonstrated the advantage of the combined analysis. Table 2 listed 20 randomly selected proteins, which were identified by one peptide in ESI route and further confirmed by another peptide detected in MALDI route. The improved peptide coverage can been observed not only on the above mentioned one peptide hits but also on some two or more peptides hits. Take NAD(P) transhydrogenase for example, three peptides were identified
155
in ESI method with sequence of VTIAQGYDALSSMANIAGYK, ILIVGGGVAGLASAGAAK and SLGAEPLEVDLK, the ion scores of these three peptides was 121, while in MALDI method two additional peptides with sequence of AVVLAANHFGR and GITHIGYTDLPSR were identified. Their ion scores were 70. The protein was more successfully identified with five matching peptides and a total ion score of 191 by the parallel MS/MS analysis. UDP-glucuronosyltransferase 2B4 precursor was another example. Two peptides of ANVIASALAKIPQK (ion score: 63) and IHHDQPVKPLDR (ion score: 37) were read out in ESI method, while another peptide of WIPQNDLLGHPK with ion score of 47 was read out in MALDI method. The parallel analysis turned the two-peptide assignment protein in ESI–MS/MS into three-peptide matching protein and the protein of UDP-glucuronosyltransferase 2B4 precursor was identified with higher confidence. The increased peptides number per protein identification can be regarded as a parameter of evaluating the identification confidence at proteome level. The peptides number per protein identification in parallel analysis was 2.8, while in the single ESI–MS/MS was 2.1. Fig. 6 showed the comparison of proteins identified with different numbers of unique peptides in ESI–MS/MS analysis and the parallel analysis. It could be seen that the number of totally identified proteins and the proteins identified with at least two peptides was apparently improved in the combined result, which demonstrated the benefit of the parallel analysis. For the identification of high abundance proteins in complicated sample, the combined data is not significantly different
Table 2 Twenty proteins identified by one peptide in ESI–MS/MS method and reconfirmed by another unique peptide in MALDI–MS/MS method (randomly selected) Protein accession no. in IPI
Protein name
Sequence read out in ESI–MS/MS
Sequence read out in MALDI–MS/MS
Sequence coverage (%)
IPI00003839
Splice isoform 2 of O75947 ATP synthase D chain, mitochondrial UV excision repair protein RAD23 homolog B Carnitine O-palmitoyltransferase II, mitochondrial precursor Purine nucleoside phosphorylase Hypothetical protein FLJ14347 5-Oxoprolinase (ATP-hydrolysing) Splice isoform Long of Q92947 glutaryl-CoA dehydrogenase, mitochondrial precursor PP3856 protein Hypothetical protein Farnesyl diphosphate synthase Splice isoform 1 of Q14315 filamin C Tax Id = 9606 Tax Id = 9606 Glycine N-methyltransferase Enolase 2 Voltage-dependent anion channel 1 Villin 2 UMP–CMP kinase Sterol/retinol dehydrogenase Cytochrome P450 2C18
LAALPENPPAIDWAYYK
NLIPFDQMTIEDLNEAFPETK
26.8
AVEYLLMGIPGDR
NFVVVMVTKPK
5.8
SIVPTMHYQDSLPR
YILSDSSPAPEFPLAYLTSENR
5.4
VFGFSLITNK ELFSNLQEFAGPSGK GLPLEVSSEDHMDDGSPIR EIISEMGELGVLGPTIK
FEVGDIMLIR FTVIYNEQMASK GHTACADAYLTPAIQR DMLGGNGISDEYHVIR
6.8 10.8 3.0 7.4
LDSGDLLQQAQEIR ISGSILNELIGLVR VLTEDEMGHPEIGDAIAR LIALLEVLSQK EHALLAYTLGVK DSTLIMQLLR SLGVAAEGLPDQYADGEAAR AAVPSGASTGIYEALELR LTFDSSFSPNTGK IGFPWSEIR YGYTHLSAGELLR SFLEIWDR YGLLLLLK
LLGSDGSPLMDMLQLAEEPVPQAGQELR ADVFHAYLSLLK QDFVQHFSQIVR ILVGPSEIGDASK STTTGHLIYK TAFDEAIAELDTLSEESYK AWLLGLLR SGETEDTFIADLVVGLCTGQIK VNNSSLIGLGYTQTLKPGIK FYPEDVAEELIQDITQKMETGMLK SVDEVFDEVVQIFDK CTQDLSLVTNCMEHALIACHPR DFIDCFLIK
7.7 2.8 7.1 1 5.0 10.9 9.3 8.8 11.5 5.5 12.1 9.3 3.4
IPI00008223 IPI00012912 IPI00017672 IPI00022300 IPI00024204 IPI00024317
IPI00029020 IPI00100160 IPI00101405 IPI00178352 IPI00180730 IPI00180776 IPI00215925 IPI00216171 IPI00216308 IPI00216311 IPI00219953 IPI00289551 IPI00289647
156
J. Zhang et al. / Analytica Chimica Acta 566 (2006) 147–156
Acknowledgment This work was supported by the National High Technology Development Program: 863-project, 2002AA2Z2042 and Shanghai Key Project of Basic Science Research, Grant Number: 04DZ14005. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the on-line version, at doi:10.1016/j.aca.2006.03.045. References
Fig. 6. Comparison of proteins identified with diferrent numbers of matching peptides in ESI–MS/MS analysis and parallel analysis.
from the single MS/MS analysis result because these proteins are identified with rather high scores, like protein: liver carboxylesterase precursor (1019 MOWSE score from MALDI method and 908 MOWSE score from ESI method). However, for those proteins whose identification may be somewhat tenuous, especially the medium or low abundance proteins, the combination of the results from both MALDI and ESI brings the total scores for these uncertain proteins into a range where they can be identified with confidence. For example, in the case of glucosidase II alpha subunit, it was only identified in the ESI results by three peptides with a total MOWSE score of 45. However, the protein score from the combined peptide level TOF–TOF and QUADRUPOLE–TOF data was 95 and the combined result contained five peptides for this protein, which demonstrated that the parallel analysis provided complementary and comprehensive information of a protein, and improved the identification confidence of the poorly identified proteins. 4. Conclusion This work was reported to combine 2D LC separation with a parallel MS/MS identification for proteome analysis of complicated biosample, i.e. global lysate of normal human liver tissue. The 2D LC system could provide sufficient resolution to the digests of complicated protein sample compared to 1D LC, thus we have chosen 2D LC for combination with the parallel analysis. The micro-scale LC column ensured that adequate sample was injected into the separation system, which satisfied the requirement of sample amount for parallel analysis. The identification results demonstrated that the complementary nature of MALDI-based and ESI-based analysis could improve proteome coverage and protein identification confidence, especially for those one peptide-matching proteins identified in the single MS/MS analysis route. Using the parallel identification strategy, more confident and comprehensive information can be gleaned from a complicated biosample without repeating the separation process, so it was recommended to apply such identification approach in proteome research of mammalian tissues or cells.
[1] A.W. Moore, J.W. Jorgenson, Anal. Chem. 67 (1995) 3448. [2] M.T. Davis, J. Beierle, E.T. Bures, M.D. McGinley, J. Mort, J.H. Robinson, C.S. Spahr, W. Yu, R. Luethy, S.D. Patterson, J. Chromatogr. B 752 (2001) 281. [3] D.A. Wolters, M.P. Washburn, J.R. Yates, Anal. Chem. 73 (2001) 5683. [4] A.J. Link, J. Eng, D.M. Schieltz, E. Carmack, G.J. Mize, D.R. Morris, B.M. Garvik, J.R. Yates III, Nat. Biotechnol. 17 (1999) 676. [5] Y. Wang, J. Zhang, C.L. Liu, X. Gu, X.M. Zhang, Anal. Chim. Acta 530 (2005) 227. [6] S.N. Davinder, L. Liang, J. Chromatogr. A 711 (1995) 235. [7] Y.J. Zhen, N.F. Xu, B. Richardson, R. Becklin, J.R. Savage, K. Blake, J.M. Peltier, J. Am. Soc. Mass Spectrom. 15 (2004) 803. [8] J. Malmstr¨om, K. Larsen, L. Malmstr¨om, E. Tufvesson, K. Parker, J. Marchese, B. Williamson, D. Patterson, S. Martin, P. Juhasz, G. Westergren-Thorsson, G. Marko-Varga, Electrophoresis 24 (2003) 3806. [9] W.M. Bodnar, R.K. Blackburn, J.M. Krise, M.A. Moseley, J. Am. Soc. Mass Spectrom. 14 (2003) 971. [10] L.R. Yu, N. Wang, G.D. Wu, Chin. Sci. Bull. 45 (2000) 1113. ¨ [11] J.E. Celis, H. Wolf, M. Ostergaard, Electrophoresis 21 (2000) 2115. [12] A.R. Cole, H. Ji, R.J. Simpson, Electrophoresis 21 (2000) 1772. [13] T. Minowa, S.S. Ohtsuka, H.M. Kamada, Electrophoresis 21 (2000) 1782. [14] M.V. Dwek, H.A. Ross, A.J.C. Leathem, Proteomics 1 (2001) 756. [15] G.H. Ha, S.U. Lee, D.G. Kang, N.Y. Ha, S.H. Kim, J. Kim, J.M. Bae, J.W. Kim, C.W. Lee, Electrophoresis 23 (2002) 2513. [16] C.H. Lee, H.H. Hsiao, C.W. Lin, S.P. Wu, S.Y. Huang, C.Y. Wu, A.H.J. Wang, K.H. Khoo, Proteomics 3 (2003) 2472. [17] R.C. Beavis, T. Chaudhary, B.T. Chait, Org. Mass Spectrom. 27 (1992) 156. [18] T. Miliotis, S. Kjellstr¨om, J. Nilsson, T. Laurell, L.E. Edholm, G. MarkoVarga, Rapid Commun. Mass Spectrom. 16 (2002) 117. [19] H. Thomas, J. Havli, J. Peychl, A. Shevchenko, Rapid Commun. Mass Spectrom. 18 (2004) 923. [20] C. Wagner, W.T. Briggs, R.J. Cook, Biochem. Biophys. Res. Commun. 127 (1985) 746. [21] E. Yeo, C. Wagner, Proc. Natl. Acad. Sci. U.S.A. 91 (1994) 210. [22] C.K. Mathews, K.E. Holde, Biochemistry, 2nd ed., The Benjamin/Cummings Publishing Company, Inc., San Francisco, 1995. [23] S. Tsuchida, K. Sato, Crit. Rev. Biochem. Mol. Biol. 27 (1992) 337. [24] R. Mentlein, G. Reuter, E. Heymann, Arch. Biochem. Biophys. 240 (1985) 801. [25] E.V. Pindel, N.Y. Kedishvili, T.L. Abraham, M.R. Brzezinski, J. Zhang, R.A. Dean, W.F. Bosron, J. Biol. Chem. 272 (1997) 14769. [26] T. Satoh, Rev. Biochem. Toxicol. 8 (1987) 155. [27] J. Kyte, R.F. Doolittle, J. Mol. Biol. 157 (1982) 105. [28] J. Wei, J. Sun, W. Yu, A. Jones, P. Oeller, M. Keller, G. Woodnutt, J.M. Short, J. Proteome Res. 4 (2005) 801. [29] L. Martens, P.V. Damme, J.V. Damme, A.E. Staes, E. Timmerman, B. Ghesquiere, G.R. Thomas, J. Vandekerckhove, K. Gevaert, Proteomics 5 (2005) 3193.