Liquid Chromatography—Mass Spectrometry-Based Proteomics of Nitrosomonas

Liquid Chromatography—Mass Spectrometry-Based Proteomics of Nitrosomonas

C H A P T E R T W E N T Y- O N E Liquid Chromatography—Mass Spectrometry-Based Proteomics of Nitrosomonas Hans J. C. T. Wessels,* Jolein Gloerich,† ...

939KB Sizes 0 Downloads 47 Views

C H A P T E R

T W E N T Y- O N E

Liquid Chromatography—Mass Spectrometry-Based Proteomics of Nitrosomonas Hans J. C. T. Wessels,* Jolein Gloerich,† Erwin van der Biezen,‡,1 Mike S. M. Jetten,‡,§ and Boran Kartal‡ Contents 466 468 469 470 471 472 476 479 479

1. Introduction 2. LC–MS/MS Instrument Setup 3. Growth of N. eutropha C91 Pure Culture 4. Sample Preparation 5. C18 Reversed Phase LC–MS/MS Analysis 6. Database Searches and Validation of Results 7. Dataset Description and Protein Identification Example 8. Conclusions References

Abstract During the last century, the research on aerobic ammonium-oxidizing bacteria (AOB) lead to many exciting physiological and biochemical discoveries. Nevertheless the molecular biology of AOB is not well understood. The availability of the genome sequences of several Nitrosomonas species opened up new possiblities to use state of the art transcriptomic and proteomic tools to study AOB. With the currect technology, thousands of proteins can be analyzed in several hours of measurement and translated proteins can be detected at femtomole and attomole concentrations. Moreover, it is possible to use mass spectrometrybased proteomics approach to analyze the expression, subcellular localization, posttranslational modifications, and interactions of translated proteins. In this * Nijmegen Centre for Mitochondrial Disorders, Department of Laboratory Medicine, Nijmegen Proteomics Facility, Radboud University Nijmegen Medical Centre, Geert Grooteplein-Zuid 10, Nijmegen, The Netherlands { Department of Laboratory Medicine, Nijmegen Proteomics Facility, Radboud University Nijmegen Medical Centre, Geert Grooteplein-Zuid 10, Nijmegen, The Netherlands { Department of Microbiology, Institute of Water and Wetland Research, Radboud University Nijmegen, Heyendaalseweg 135, The Netherlands } Department of Biotechnology, Delft University of Technology, Delft, The Netherlands 1 Present address: Merck Sharp & Dohme BV, 5342 CC Oss, The Netherlands Methods in Enzymology, Volume 486 ISSN 0076-6879, DOI: 10.1016/S0076-6879(11)86021-0

#

2011 Elsevier Inc. All rights reserved.

465

466

Hans J. C. T. Wessels et al.

chapter, we describe our LC–MS/MS methodology and quality control strategy to study the protein complement of Nitrosomonas eutropha C91.

1. Introduction Ammonium is a major pollutant (toxic to animals and plants, causes eutrophication) in municipal and industrial wastewaters and its discharge to water bodies is regulated worldwide by strict legislation. In natural ecosystems and in conventional wastewater treatment plants, in the presence of O2, ammonium is oxidized to nitrate via nitrite by two clades of microorganisms. The first step of this process is mediated by aerobic ammoniumoxidizing bacteria (AOB or ammonium-oxidizing archaea, AOA) and is followed by the oxidation of nitrite to nitrate by nitrite-oxidizing bacteria (NOB). AOB oxidize ammonium via hydroxylamine to nitrite and there are two major groups of bacteria that are thought to be important: beta- and gammaproteobacterial AOB. In the betaproteobacterial linage, many strains affiliated to two species of Nitrosomonadaceae (Nitrosomonas europaea and Nitrosomonas eutropha) are among the AOB that have been studied over a 100 years since the first isolation of an ammonium-oxidizing microorganism by Winogradsky (1892). Physiological and biochemical studies have resulted in the identification and, in some cases, isolation of main catabolic enzymes of AOB (Arp et al., 2002); still, the molecular biology of AOB is not well understood. For example, several strains of N. eutropha have been shown to oxidize ammonium with NO2 under anoxic or microaerophilic conditions (Schmidt and Bock, 1997) and H2-dependent growth was shown for N. eutropha N904 (Bock et al., 1995; Kampschreur et al., 2008; Schmidt et al., 2001, 2004). However, it is currently unknown how these alternate pathways function or are regulated. Recent publications of the genome sequences of several Nitrosomonas (Chain et al., 2003; Stein et al., 2007), Nitrosococcus (Klotz et al., 2006), and Nitrosospira (Norton et al., 2008) species have facilitated the application of state-of-the-art technologies for the in depth analysis of the molecular mechanism of AOB. The first publication of such an approach was on the transcriptomic analysis of N. europaea (Beyer et al., 2009). This study revealed that mRNA concentrations of catabolic proteins such as amoA (ammonia monooxygenase subunit A) and hao (hydroxylamine oxidoreductase) could be correlated to activity and growth rates of bacteria under aerobic conditions. Under anaerobic conditions, the transcription levels of these genes were reduced and those of nirK (nitrite reductase),

Proteomics of Nitrosomonas

467

norB (nitric oxide reductase), and nsc (nitrosocyanin, red copper protein) were significantly increased. The recent elucidation of the N. eutropha C91 genome sequence enables the use of a mass spectrometry database search driven proteomics to analyze the protein complement of Nitrosomonas in great detail. Analyses of the expression (Lacerda and Reardon, 2009; Phillips and Bogyo, 2005), subcellular localization (Foster et al., 2006), posttranslational modifications (Farley and Link, 2009; Rogers and Foster, 2009), and interactions (Andersen et al., 2003; Monti et al., 2009; Wessels et al., 2009; Wittig and Schagger, 2009) of translated proteins by high-throughput mass spectrometry approaches provide a wealth of information to help unravel biochemical pathways and other cellular processes. Although the analysis of the transcriptome of a given microorganism gives us valuable insights, the current technology allows us to look one step further: amount of translated proteins can now be detected at very low concentrations (i.e., fmol, amol). The sensitive and robust performance of LC–MS/MS-based proteomics makes it an attractive system to use for studying prokaryotic proteomes. Low femtomole to attomole protein sensitivity and high-throughput analysis of samples are crucial when biomass is limited or when multiple samples need to be analyzed within a short time span. Modern LC–MS/MS equipment (when handled correctly) is now capable of identifying and quantifying up to thousands of proteins across multiple samples within just hours of measurement time (Wisniewski et al., 2009). Performance is not only instrument type dependent, but it is also determined by the quality of sample preparation, chromatographic separation, mass spectrometer operation, data processing, and fractionation approach. It is therefore important to optimize (and monitor) the performance of the entire procedure in advance to any large-scale proteomics experiments. Although sample fractionation at protein or peptide level can be used to increase the proteome coverage, it also significantly increases the amount of time and costs needed to perform the experiment (Fang et al., 2010). The tradeoff between an increased proteome coverage and required throughput is thus unique for each research group and question, and is therefore an important aspect to consider when designing experiments. The approach described here was used to study the top 500–1000 most abundant proteins in N. eutropha cells of which many are involved in primary and secondary cellular processes, but may easily be applied to other microorganisms. It is important to realize that parts of the method described here might be equipment specific. Nevertheless we believe that the majority of our approach and quality control can be incorporated into existing protocols using different instrumentation. The method described in this communication is currently our optimal approach for the analysis of the unfractionated N. eutropha C91 proteome.

468

Hans J. C. T. Wessels et al.

2. LC–MS/MS Instrument Setup The LC–MS/MS setup uses a splitless nanoflow liquid chromatograph (Easy nano LC, Proxeon) for C18 reversed phase peptide separations coupled online to a 7 T hybrid quadrupole linear ion trap Fourier-transform ion cyclotron resonance mass spectrometer (Syka et al., 2004; LTQ FT Ultra, Thermo scientific) via a modified nanoelectrospray source (Thermo scientific). Generic performance of the LC–MS/MS system is assessed on a daily basis by evaluating more than 46 performance metrics obtained from a standard Escherichia coli proteome LC–MS/MS measurement using the software of Rudnick et al. (2010). Peptide separations are performed using 15-cm long 100-mm internal diameter emitters (PicoTip emitter FS360100-8-N-5-C15, New Objective) packed in-house with 3 mm particles (120 A˚ pore) C18-AQ ReproSil-Pur reversed phase material (Dr. Maisch GmbH) shown in Fig. 21.1. Column packing is achieved by pressure loading (40 bar) of the C18 particles into the emitter using a pressure bomb and high purity He gas (Ishihama et al., 2002). Upon packing, the emitter outlet forces the particles to form a self-assembled frit as long as the outlet size to particle diameter ratio remains below a 5:1 (Ishihama et al., 2002). The advantages of using these reproducible in-house packed emitters include versatility, very low costs, and, most importantly, high 2.2 kV

Gold wire PEEK micro tee

PEEK line from LC

MS capillary inlet

Packed emitter

Figure 21.1 Schematic representation of the nanoelectrospray setup. The 50 mm internal diameter (ID)  360 mm outer diameter (OD) flow line is connected via a micro tee to both the column and a gold wire. The column, which is a 15 cm long 100 mm ID  360 mm OD emitter packed with 3 mm C18 reversed phase material, is placed just a few millimeters away from the mass spectrometer capillary inlet. The gold wire is used to apply a 2.2 kV electrospray voltage directly to the liquid flow in front of the column via a zero-dead volume micro tee fitting.

469

Proteomics of Nitrosomonas

chromatographic resolution. Hence, there is virtually no possibility of postcolumn peak broadening as peptides are immediately ionized and analyzed when they leave the column outlet. Signal stability and intensity of the nanoelectrospray source depend (amongst other factors) on the quality of the emitter outlet which needs to be evaluated prior to any analytical LC–MS/MS analysis. We consider our ion source stable when the average total ion current (TIC) signal deviation for buffer ions is less than 12% for at least 150 survey spectra acquired at a constant flow of 300 nl min 1 0.5% acetic acid. We recommend to document and monitor performance characteristics of the system on a regular basis. This may help as an early diagnosis for possible performance deterioration and ensures consistent quality throughout large-scale experiments.

3. Growth of N. eutropha C91 Pure Culture A 7-l glass vessel with a working volume of 4 l is used for growing N. eutropha C91 (Fig. 21.2A). The reactor is fed continuously with a flow rate of 1.4 ml min 1. The influent mineral medium is modified from Schmidt and Bock (1997) and contains 50 mM of NH4Cl and following nutrients (per 1 l of medium): 585 mg NaCl, 147 mg CaCl22H2O, 74 mg KCl, 54 mg KH2PO4, 49 mg MgSO4 7H2O, 12 g HEPES, and 1 ml trace elements solution containing 0.02 M HCl, 973 mg FeSO4 7H2O, 49 mg H3BO3, 43 mg ZnSO4 7H2O, 37 mg (NH4)6Mo7O244H2O, 34 mg MnSO4 H2O, 16 mg CuSO4. The pH of the medium is then adjusted A

B

Figure 21.2 (A) N. eutropha C91 pure culture. (B) Fluorescence in situ hybridization micrograph of the N. eutropha C91 pure culture. A triple hybridization with the probes NEU (Wagner et al., 1995), Nso1225 (Mobarry et al., 1996), and EUB338 (Daims et al., 1999) is applied. Bar ¼ 5 mm.

470

Hans J. C. T. Wessels et al.

to 7. A high ammonium concentration (40–50 mM) is recommended for a rapid start-up of the bioreactor. For the homogeneous distribution of substrates, the vessel is stirred at 200 rpm with one six-bladed Rushton and one marine impeller (60 mm). Disappearance of ammonium and production of nitrite should be monitored daily. We use the rapid nitrite and ammonium determination protocols as described in Kartal et al. (2006). A heating blanket is used to keep temperature at 30  C. A gas mixture of Air/N2 (90/10%) with a flow of 500 ml l 1 is sparged through the reactor for O2 supply. A pH controller unit is used to supply a solution of Na2CO3 (100 g l 1) when necessary to keep the pH stable at 7.0. If desired, a dissolved O2 probe may also be used to monitor the dissolved O2 concentration.

4. Sample Preparation The N. eutropha cells (5 l, OD600  0.2; Fig. 21.2B) are centrifuged at 5  C and 4000  g. After centrifugation, the pellet is resuspended in 1 volume 20 mM potassium phosphate buffer, pH 8. Cell suspensions are passed three times through a French pressure cell operated at 138 MPa. After centrifugation for 15 min at 1700  g at 4  C, the cell-free fraction is obtained as clarified supernatant. Proteins are diluted 1:1 (v/v) in 8 M urea 10 mM Tris–HCl pH 8 and placed at room temperature for 30 min to allow for protein denaturation. Reduction of cysteine disulfide bridges is performed by adding 2 ml 10 mM dithiotreitol solution and incubating for 30 min at room temperature. Reforming of disulfide bridges is prohibited via alkylation of reduced cysteines by adding 2 ml of 50 mM chloroacetamide and subsequent incubation for 30 min at room temperature in the dark. The use of chloroacetamide is preferred over iodoacetamide as Nielsen et al. (2008) reported that iodoacetamide induces artifacts that mimic ubiquitination. As trypsin digestion is incompatible with the 4 M urea concentration, a primary incubation of the sample with the protease “LysC” can be performed in a 1:50 (w/w) LysC:protein ratio for 3 h at room temperature. This step results in the cleavage of proteins into smaller peptides through cleavage at the C-terminal side of lysine residues under strong denaturing conditions. Next, the sample is diluted 1:3 (v/v) with 50 mM ammonium bicarbonate to allow trypsin digestion to take place in a less denaturing environment. Trypsin is added in a 1:50 (w/w) trypsin: protein ratio and the sample is placed at 37  C for overnight digestion. Subsequently, the resulting peptide mixture can be concentrated and desalted using stop and go elution (STAGE) tips as described by Rappsilber et al. (2003). A popular alternative to this STAGE solid phase extraction step is the commercially available alternatives such as Millipore ZipTips. However, we prefer the use of in-house constructed STAGE tips

Proteomics of Nitrosomonas

471

because of the low costs and reduced handling steps. Briefly, acidified peptides bind to the activated C18 reversed phase material which is fixed in a standard pipettor tip allowing the peptides to be washed using 0.5% acetic acid and eventually eluted in 20 ml 80% acetonitrile 0.5% acetic acid. The acetonitrile buffer can be removed by centrifugation under vacuum after which samples are taken up in a total volume of 20 ml 0.5% acetic acid prior to LC–MS/MS analysis. We use acetic acid here to dissolve the sample as it is used later on as ion pair reagent in buffers used for C18 reversed phase LC–MS/MS analysis.

5. C18 Reversed Phase LC–MS/MS Analysis For RP LC–MS/MS analysis, we load samples directly onto the analytical column at maximum flow rates determined by a set maximum pressure of 240 bar. Peptides are eluted from the analytical column directly toward the mass spectrometer capillary inlet via a nanoelectrospray source which is used to apply a distal electrospray voltage of 2.2 kV to the liquid flow using a standard micro tee setup (Fig. 21.1). Peptide elution is best performed using an increasing linear gradient of 10–40% acetonitrile in 90 min with a constant concentration of 0.5% acetic acid as ion pair reagent. To prevent carryover, the system is washed by increasing the acetonitrile concentration from 40% to 80% in 5 min followed by a 10 min isocratic flow of 80% acetonitrile at a flow rate of 600 nlmin-1. Finally, the system is equilibrated with 3% acetonitrile at 600 nl min 1 for 10 min. The mass spectrometer settings described hereafter are specific for the type of instrument and manufacturer. In general, they should be optimized for the type of application and sample characteristics and may thus differ significantly between experiments and research groups. For the experimental design described here, we program the LTQ FT Ultra instrument to acquire four data-dependent MS/MS spectra of the four most abundant peptide ions (with charge z ¼ 2þ or 3þ) from a survey scan by the ion cyclotron resonance cell. The acquisition of MS/MS spectra in the ion trap is performed in parallel to a long transient ICR cell survey scan. This parallel acquisition is achieved by selecting precursor ions for collision activated dissociation experiments in the linear ion trap from a first short part of the ICR cell measurement. This allows the ion trap to perform multiple MS/MS analyses while the ICR cell completes the accurate (long transient) measurement. Repetitive MS/MS analysis of identical ions is prohibited by the use of dynamic exclusion with settings optimized for the typical chromatographic performance of our system. Our instrument method uses a dynamic exclusion time of 180 s in combination with early expiration settings that remove m/z values from the dynamic exclusion list when 10

472

Hans J. C. T. Wessels et al.

spectra are recorded with a signal-to-noise ratio below 2 for each individual ion. We highly recommend users to optimize the dynamic exclusion parameters as it may yield more than 100% performance increase for the number of identified unique peptides. Survey scans by the ICR cell are set to cover a range of 350–1600 m/z with a resolution power of 100,000 full width at half maximum (FWHM) at 400 m/z. The low mass cutoff of 350 m/z is used to prevent the detection of buffer ions (that reduce the effective dynamic range for peptide ions) and peptides shorter than six amino acids sequence length. In general, shorter peptides have low protein specificity and significantly contribute to the relative proportion of false positive identifications (Cox and Mann, 2008). Tryptic peptides are generally observed below m/z 1600 which is therefore used as high mass cutoff. Raw data files are evaluated manually to check for chromatographic (over)loading, resolution, and absence of interfering chemical contaminants like polyethylene-glycols. In our opinion, ion maps are ideally suited for this purpose as these (unlike total ion chromatograms) are capable of visualizing all critical characteristics for individual LC–MS/MS runs of very complex samples as shown in Fig. 21.3.

6. Database Searches and Validation of Results Depending on the database search software and the type of mass spectrometer used, raw data needs to be converted to a format that can be used by the software of choice. Many different search engines are available nowadays (both commercial and open source; Dagda et al., 2010; Sadygov et al., 2004) that can be used for protein identification or quantification purposes. Some well-known search engines are Mascot (Perkins et al., 1999), X!Tandem (Craig and Beavis 2004), SEQUEST (Yates et al., 1995), OMSSA (Geer et al., 2004), Peaks studio (Bioinformatics Solutions: http://www.bioinformaticssolutions.com), ProsightPTM (LeDuc et al., 2004), and Phenyx (Colinge et al., 2003). We prefer to use Matrix Science Mascot for which we convert the Thermo raw files into generic mascot files via in-house developed Perl scripts (this can also be performed using freely available software packages; Cox and Mann, 2008; DTA supercharge: http://msquant.sourceforge.net; Pedrioli 2010). All database searches are performed using the Refseq N. eutropha C91 sequences supplemented with sequences of known contaminant proteins. It is important to add these contaminant proteins to the database to ensure that corresponding spectra do not erroneously match N. eutropha C91 sequences. Some of these contaminants are added on purpose during digestion (LysC and Trypsin) whereas other proteins (e.g., skin proteins such as keratins) contaminate samples via the air or reagents. We first perform a loose mass accuracy search of the data allowing a mass error of 20 ppm (which is a relatively large error

473

Proteomics of Nitrosomonas

1400 1300 1200 1100

900

m/z

1000

800 700 600 500 400 20

40

60

80

100

120

Time (min)

Figure 21.3 Ion map view of the analytical LC–MS/MS data acquired from a N. eutropha C91 proteome sample. This map plots the m/z values versus elution time with corresponding intensity in gray scale values. Ions are visible as dots using this large time range. Zooming in on the data allows for manual evaluation of chromatographic performance and intensity-dependent mass accuracy. Constant background ions from fused silica or buffer components are visualized as horizontal lines that may span the complete time range (prominent ions are indicated using arrow heads). Sample contamination by polyethylene-glycols and other (similar) polymers is easily recognizable as a series of diagonal dots with a consistent increase of m/z value and retention time in the analytical part of a reversed phase separation. Polyethylene-glycol ions are encircled in this figure for ease of interpretation. In our experience, we always find low amounts of polyethylene-glycols to be present in reversed phase separations, presumably originating from laboratory plastics or reagents.

for our type of instrument). This database search result is then used to assess the mass accuracy of each measurement and can be used to recalibrate the data offline prior to conclusive database searches. The best suitable method for offline mass recalibration depends on the type of instrument and error (e.g., general transponation, m/z dependent mass error or signal intensitydependent mass error). For this particular measurement, we calculated the average parent ion m/z errors for precursor m/z bins of 100 Th. This data was then used to obtain Eq. (1) which was used for m/z-dependent parent ion m/z recalibrations.

474

Hans J. C. T. Wessels et al.

 ðm=zÞcalibrated ¼ ðm=zÞmeasured  0:0000005  ðm=zÞmeasured  0:0009 : ð21:1Þ This precursor ion recalibration improved mass accuracy from 2.0 ( 1.17) to 0.17 (1.13) ppm as visualized in Fig. 21.4. For the LC–MS/MS dataset of N. eutropha C91 samples, we specified a precursor mass tolerance of 10 ppm for ICR cell spectra of precursor ions and 0.8 Da mass tolerance for fragment ions analyzed by the linear ion trap. Database search parameters include specific trypsin cleavage with a tolerance of one missed cleavage. Furthermore, carboxamidomethylation of cysteines by chloroacetamide is set as fixed modification and variable modifications include N-terminal protein acetylation, deamidation of glutamine and asparagine, and oxidation of methionine. Result files are downloaded automatically from the server using a custom-made Perl script which is triggered by the Mascot Daemon search tool after each search completion. Raw data

6

m/z Error (ppm)

4 2 0 300

500

700

900

1100

1300

1500

1100

1300

1500

–2 –4 –6 Recalibrated data

6

m/z Error (ppm)

4 2 0 300 –2

500

700

900

–4 –6

m/z

Figure 21.4 Parent ion m/z errors for validated peptide identification before (top graph) and after (bottom graph) offline recalibration. Recalibration improved the average precursor ion m/z accuracy from  2.00 (1.17) to  0.17 (1.13) ppm.

Proteomics of Nitrosomonas

475

Subsequent validation of search results is of utmost importance considering the nature and amount of retrieved data. By now there are numerous software packages available for search data parsing and validation purposes such as MSQuant (DTA Supercharge: http://msquant.sourceforge.net), Trans-proteomic pipeline (Keller et al., 2002; Nesvizhskii et al., 2003; Pedrioli, 2010), PROVALT (Weatherly et al., 2005), MaxQuant (Cox and Mann, 2008), Scaffold (Searle, 2010), cot Distiller (Matrix Science: www.matrixscience.com). Many more excellent software solutions may be found in literature, but it is beyond the scope of this chapter to list and describe them all. Successful approaches for large dataset validation often use false discovery rate validation or even posterior error probabilities (local false discovery rates). Both approaches require an additional database search of the data against reversed or randomized sequences from the normal database (Bianco et al., 2009). These decoy search results are used to calculate the false positive rate or posterior error probabilities. In general, both methods require high numbers of matched MS/MS spectra to be accurate and are therefore not suited for every type of sample and/or purpose. We generally use false discovery rates at the protein level for datasets that encompass 10,000–100,000 MS/MS spectra and posterior error probabilities for datasets that include 100,000 or more MS/MS spectra. Validation of smaller datasets can be performed using the random match probabilities calculated by the search engine in combination with additional parameters to ensure confident protein identifications. We currently cannot use FDR-based validation strategies for datasets generated using the described experimental design due to the low number of protein identifications found in decoy database searches (using reversed protein sequences). The decoy database search using the N. eutropha C91 data yielded only 44 protein identifications (exclusively single peptide identifications) with Mascot identification scores in the range of 16–33 (we do not parse peptides with identification scores below 16). When we use this decoy database search result to calculate Mascot score thresholds to achieve a 1% FDR, we find an exceptionally low Mascot peptide identification score threshold of just 27. We therefore apply a low sample complexity validation strategy using the in-house developed PROTON software package (proteomics tools Nijmegen, H. J. C. T. Wessels, J. Gloerich et al., unpublished). The criteria used by PROTON for data validation were empirically determined from previously acquired (much larger) datasets from various organisms and cell types to achieve an overall protein false discovery rate of 1% or better. Criteria used for validation require a minimum Mascot score of 30 for peptides originating from proteins identified with multiple unique peptide sequences. Single peptide matches require a minimum Mascot score of 49 and a modified delta score of at least 10. The modified delta score is calculated as the difference in score between the first and second best unique sequence match for a given

476

Hans J. C. T. Wessels et al.

spectrum. This value differs from the original delta score which is calculated between the first and second match, regardless of the peptide sequences, and may thus differ only by the position of a variable modification site. We further allow a maximum of three variable modifications for individual peptide matches. When we apply the same validation criteria to the decoy database search result, we can calculate an infinitely small false discovery rate for our data, which is an underestimation of the true FDR due to the low number of decoy database hits. Our software is also used to calculate estimated protein abundance for all identified proteins using the exponentially modified protein abundance index (emPAI; Ishihama et al., 2005). This spectral count method uses the correlation between the number of identified unique peptide ions (normalized between proteins for the number of theoretically detectable peptides) and protein concentration (Ishihama et al., 2005; Rappsilber et al., 2002). The number of theoretically detectable peptides can be calculated in-silico for each protein using the physicochemical properties of predicted peptides in combination with the mass spectrometer settings. Ideally, the physicochemical cutoff values are calculated a priori using the properties of identified peptides from large datasets that are generated using an identical protocol. The exponential modification of the PAI value is needed to obtain a linear correlation between PAI and protein concentration (Ishihama et al., 2005). Calculated emPAI values may be used to compare abundances between different proteins in one or multiple samples. It is important to note that a comparison of protein abundances between samples requires the samples to be comparable in terms of complexity and data quality. A limitation of this technique can be its accuracy which depends on the number of theoretically detectable peptides and the number of identified unique ions. We therefore emphasize that care should be taken when using emPAI values of low abundant and/or small proteins.

7. Dataset Description and Protein Identification Example A single LC–MS/MS analysis of nonfractionated N. eutropha C91 proteome using our system identified about 2000 unique peptides corresponding to  600 unique proteins (24% coverage of the predicted proteome; Kartal et al., unpublished). On average, each protein is identified using four unique peptides. Representative protein sequence coverages are shown as a relative frequency histogram in Fig. 21.5. These were calculated as the percentage of identified amino acid residues using unique valid peptide identification sequences. As an example, one of the most abundant proteins identified by our LC–MS/MS analysis is the alkyl hydroperoxide

Proteomics of Nitrosomonas

477

30

Relative frequency (%)

25 20 15 10 5 0 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 % Protein sequence coverage

Figure 21.5 Protein sequence coverage histogram of N. eutropha C91. Values on the x-axis represent upper values for each bin whereas the y-axis is used to show the relative frequency for each respective bin. This figure shows, for example, that about 25% of all proteins in the dataset were identified with a sequence coverage between 5% and 10%.

reductase (Neut_2366, abundance based on emPAI). This protein is likely to function as a hydrogen peroxide scavenger to protect N. eutropha C91 cells from oxidative damage. This protein is identified using nine unique peptides with combined protein sequence coverage of 60% (Table 21.1) and a Mascot protein identification score of 513. This protein identification score implies a chance of less than 1  10 51 for this protein to be a random event. The data in Table 21.1 shows all relevant characteristics generally used to describe protein identifications. Details for the protein identification at both the protein and peptide level are included in this table. From information at the protein level, it may be deduced that we predicted a total of 11 peptides to be theoretically detectable using our approach, of which nine peptides are identified using 13 unique MS/MS spectra. These data are used to calculate a protein abundance of 14.2 using emPAI. The peptide level information shows each best nonredundant peptide match data in detail. The first series of columns describe each peptide identification at the mass spectrometry level and include the mass over charge (m/z) value of each peptide ion, measured peptide mass as Mr(exp), theoretical mass of the peptide Mr(thr), peptide mass error in parts per million, and charge state of the ion. The next series of columns contain the database search information. First, the number of missed cleavage sites by trypsin is shown (miss) followed by the peptide match score and the expected value for the match to be a random event. The modified delta score is also shown (mDelta) which is the difference in Mascot score between the best and second best peptide sequence match. Please note that there is no other candidate peptide

Table 21.1

Identification data for the abundant alkyl hydroperoxide reductase (Neut_2366) of N. eutropha C91

Detectable peptides: 11 Detected peptides: 9

Matched ions: 13

Score: 513

emPAI: 14.2 Peptide match

m/z

Mr(exp)

Mr(thr)

ppm

Charge Miss Score Expect

mDelta score Residues

826.879 631.3103 572.28 646.8183

1651.7434 1890.9091 1142.5455 1291.6221

1651.7478 1890.9112 1142.5469 1291.6231

 2.69  1.1  1.19  0.74

2 3 2 2

0 0 0 0

67 35 48 37

3.4E-07 0.00035 0.000017 0.00022

67 34.5 47.6 36.5

13–27 64–80 81–90 92–102

809.921 651.8566 600.8325 393.7266 852.9034

1617.8275 1301.6987 1199.6505 785.4387 1703.7922

1617.825 1301.6979 1199.651 785.4395 1703.7937

1.57 0.6  0.44  1.07  0.91

2 2 2 2 2

0 0 0 0 0

90 34 74 53 75

1.5E-09 0.00064 3.6E-08 5.6E-06 3.1E-08

86.9 33.9 74.4 43.1 75.1

107–120 121–132 133–143 144–150 154–169

Protein: YP_748545.1—alkyl hydroperoxide reductase (21 kDa).

K.ATAYHNGDFVEVSDK.T K.AGIEVYSVSTDTHFTHK.A K.AWHDSSSVVR.K K.VQYPMIGDPTR.Q þ Oxidation (M) R.DFDVLIEEEGLALR.G R.GSFIINPEGOIK.A K.ALEVQDLSIGR.N R.NAAELLR.K K.AAQYVSTHSGEVCPAK.W

Proteomics of Nitrosomonas

479

match if the modified delta score equals the Mascot match score. Here, we also report the amino acid residue numbers from the protein sequence that correspond with each peptide match. Finally, the peptide match sequence is shown along with detected variable modifications (if any) and residues from the protein sequence that flank the peptide match.

8. Conclusions Here, we describe the application of LC–MS/MS-based proteomics for the detection of nonfractionated proteome of N. eutropha C91. With the implementation of such state of the art technology, future studies could focus on the molecular mechanism of ammonium oxidation by N. eutropha C91 under different growth conditions. Moreover, it is also possible to apply similar protocols to any other microorganism with an available genome sequence.

REFERENCES Andersen, J. S., Wilkinson, C. J., Mayor, T., Mortensen, P., Nigg, E. A., and Mann, M. (2003). Proteomic characterization of the human centrosome by protein correlation profiling. Nature 426, 570–574. Arp, D. J., Sayavedra-Soto, L. A., and Hommes, N. G. (2002). Molecular biology and biochemistry of ammonia oxidation by Nitrosomonas europaea. Arch. Microbiol. 178, 250–255. Beyer, S., Gilch, S., Meyer, O., and Schmidt, I. (2009). Transcription of genes coding for metabolic key functions in Nitrosomonas europaea during aerobic and anaerobic growth. J. Mol. Microbiol. Biotechnol. 16, 187–197. Bianco, L., Mead, J. A., and Bessant, C. (2009). Comparison of novel decoy database designs for optimizing protein identification searches using ABRF sPRG2006 STANDARD MS/MS data sets. J. Proteome Res. 8, 1782–1791. Bock, E., Schmidt, I., Stuven, R., and Zart, D. (1995). Nitrogen loss caused by denitrifying Nitrosomonas cells using ammonium or hydrogen as electron donors nitrite as electron acceptor. Arch. Microbiol. 163, 16–20. Chain, P., Lamerdin, J., Larimer, F., Regala, W., Lao, V., Land, M., Hauser, L., Hooper, A., Klotz, M., Norton, J., Sayavedra-Soto, L., Arciero, D., et al. (2003). Complete genome sequence of the ammonia-oxidizing bacterium and obligate chemolithoautotroph Nitrosomonas europaea. J. Bacteriol. 185, 2759–2773. Colinge, J., Masselot, A., Giron, M., Dessingy, T., and Magnin, J. (2003). OLAV: Towards high-throughput tandemmass spectrometry data identification. Proteomics 3, 1454–1463. Cox, J., and Mann, M. (2008). MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372. Craig, R., and Beavis, R. C. (2004). TANDEM: Matching proteins with tandem mass spectra. Bioinformatics 9, 1466–1467.

480

Hans J. C. T. Wessels et al.

Dagda, R. K., Sultana, T., and Lyons-Weiler, J. (2010). Evaluation of the consensus of four peptide identification algorithms for tandem mass spectrometry based proteomics. J. Proteomics Bioinform. 3, 39–47. Daims, H., Bruehl, A., Amann, R., Schleifer, K. H., and Wagner, M. (1999). The domainspecific probe EUB338 is insufficient for the detection of all bacteria: Development and evaluation of a more comprehensive probe set. Syst. Appl. Microbiol. 22, 434–444. Fang, Y., Robinson, D. P., and Foster, L. J. (2010). Quantitative analysis of proteome coverage and recovery rates for upstream fractionation methods in proteomics. J. Proteome Res. 9, 1902–1912. Farley, A. R., and Link, A. J. (2009). Identification and Quantification of Protein Posttranslational Modifications. Methods Enzymol. 463, 725–763. Foster, L. J., de Hoog, C. L., Zhang, Y. L., Zhang, Y., Xie, X. H., Mootha, V. K., and Mann, M. (2006). A mammalian organelle map by protein correlation profiling. Cell 125, 187–199. Geer, L. Y., Markey, S. P., Kowalak, J. A., Wagner, L., Xu, M., Maynard, D. M., Yang, X., Shi, W., and Bryant, S. H. (2004). Open mass spectrometry search algorithm. J. Proteome Res. 3, 958–964. Ishihama, Y., Rappsilber, J., Andersen, J. S., and Mann, M. (2002). Microcolumns with selfassembled particle frits for proteomics. J. Chromatogr. A 979, 233–239. Ishihama, Y., Oda, Y., Tabata, T., Sato, T., Nagasu, T., Rappsilber, J., and Mann, M. (2005). Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol. Cell. Proteomics 4, 1265–1272. Kampschreur, M. J., Tan, N. C. G., Kleerebezem, R., Picioreanu, C., Jetten, M. S. M., and Loosdrecht, M. C. M. (2008). Effect of dynamic process conditions on nitrogen oxides emission from a nitrifying culture. Environ. Sci. Technol. 42, 429–435. Kartal, B., Koleva, M., Arsov, R., van der Star, W., Jetten, M. S. M., and Strous, M. (2006). Adaptation of a freshwater anammox population to high salinity wastewater. J. Biotechnol. 126, 546–553. Keller, A., Nesvizhskii, A. I., Kolker, E., and Aebersold, R. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392. Klotz, M. G., Arp, D. J., Chain, P. S., El-Sheikh, A. F., Hauser, L. J., Hommes, N. G., Larimer, F. W., Malfatti, S. A., Norton, J. M., Poret-Peterson, A. T., Vergez, L. M., and Ward, B. B. (2006). Complete genome sequence of the marine, chemolithoautotrophic, ammonia-oxidizing bacterium Nitrosococcus oceani ATCC 19707. Appl. Environ. Microbiol. 72, 6299–6315. Lacerda, C. M., and Reardon, K. F. (2009). Environmental proteomics: Applications of proteome profiling in environmental microbiology and biotechnology. Brief. Funct. Genomic. Proteomic. 8, 75–87. LeDuc, R. D., Taylor, G. K., Kim, Y. B., Januszyk, T. E., Bynum, L. H., Sola, J. V., Garavelli, J. S., and Kelleher, N. L. (2004). ProSight PTM: An integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res. 32(Web Server issue), W340–W345. Mobarry, B. K., Wagner, M., Urbain, V., Rittmann, B. E., and Stahl, D. A. (1996). Phylogenetic probes for analyzing abundance and spatial organization of nitrifying bacteria. Appl. Environ. Microbiol. 62, 2156–2162. Monti, M., Cozzolino, M., Cozzolino, F., Vitiello, G., Tedesco, R., Flagiello, A., and Pucci, P. (2009). Puzzle of protein complexes in vivo: A present and future challenge for functional proteomics. Expert Rev. Proteomics 6, 159–169. Nesvizhskii, A. I., Keller, A., Kolker, E., and Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658.

Proteomics of Nitrosomonas

481

Nielsen, M. L., Vermeulen, M., Bonaldi, T., Cox, J., Moroder, L., and Mann, M. (2008). Iodoacetamide-induced artifact mimics ubiquitination in mass spectrometry. Nat. Methods 5, 459–460. Norton, J. M., Klotz, M. G., Stein, L. Y., Arp, D. J., Bottomley, P. J., Chain, P. S., Hauser, L. J., Land, M. L., Larimer, F. W., Shin, M. W., and Starkenburg, S. R. (2008). Complete genome sequence of Nitrosospira multiformis, an ammonia-oxidizing bacterium from the soil environment. Appl. Environ. Microbiol. 74, 3559–3572. Pedrioli, P. G. (2010). Trans-proteomic pipeline: A pipeline for proteomic analysis. Methods Mol. Biol. 604, 213–238. Perkins, D. N., Pappin, D. J. C., Creasy, D. M., and Cottrell, J. S. (1999). Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567. Phillips, C. I., and Bogyo, M. (2005). Proteomics meets microbiology: Technical advances in the global mapping of protein expression and function. Cell. Microbiol. 7, 1061–1076. Rappsilber, J., Ryder, U., Lamond, A. I., and Mann, M. (2002). Large-scale proteomic analysis of the human splicesome. Genome Res. 12, 1231–1245. Rappsilber, J., Ishihama, Y., and Mann, M. (2003). Stop and go extraction tips for matrixassisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 75, 663–670. Rogers, L. D., and Foster, L. J. (2009). Phosphoproteomics—finally fulfilling the promise? Mol. Biosyst. 5, 1122–1129. Rudnick, P. A., Clauser, K. R., Kilpatrick, L. E., Tchekhovskoi, D. V., Neta, P., Blonder, N., Billheimer, D. D., Blackman, R. K., Bunk, D. M., Cardasis, H. L., Ham, A. J. L., Jaffe, J. D., et al. (2010). Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Mol. Cell. Proteomics 9, 225–241. Sadygov, R. G., Cociorva, D., and Yates, J. R. (2004). Large-scate database searching using tandem mass spectra: Looking up the answer in the back of the book. Nat. Methods 1, 195–202. Schmidt, I., and Bock, E. (1997). Anaerobic ammonia oxidation with nitrogen dioxide by Nitrosomonas eutropha. Arch. Microbiol. 167, 106–111. Schmidt, I., Bock, E., and Jetten, M. S. M. (2001). Ammonia oxidation by Nitrosomonas eutropha with NO2 as oxidant is not inhibited by acetylene. Microbiology UK 147, 2247–2253. Schmidt, I., van Spanning, R. J. M., and Jetten, M. S. M. (2004). Denitrification and ammonia oxidation by Nitrosomonas europaea wild-type, and NirK- and NorB-deficient mutants. Microbiology UK 150, 4107–4114. Searle, B. C. (2010). Scaffold: A bioinformatic tool for validating MS/MS-based proteomic studies. Proteomics 10, 1265–1269. Stein, L. Y., Arp, D. J., Berube, P. M., Chain, P. S. G., Hauser, L., Jetten, M. S. M., Klotz, M. G., Larimer, F. W., Norton, J. M., den Camp, H., Shin, M., and Wei, X. M. (2007). Whole-genome analysis of the ammonia-oxidizing bacterium, Nitrosomonas eutropha C91: Implications for niche adaptation. Environ. Microbiol. 9, 2993–3007. Syka, J. E. P., Marto, J. A., Bai, D. L., Horning, S., Senko, M. W., Schwartz, J. C., Ueberheide, B., Garcia, B., Busby, S., Muratore, T., Shabanowitz, J., and Hunt, D. F. (2004). Novel linear quadrupole ion trap/FT mass spectrometer: Performance characterization and use in the comparative analysis of histone H3 post-translational modifications. J. Proteome Res. 3, 621–626. Wagner, M., Rath, G., Amann, R., Koops, H. P., and Schleifer, K. H. (1995). In-situ identification of ammonia-oxidizing bacteria. Syst. Appl. Microbiol. 18, 251–264.

482

Hans J. C. T. Wessels et al.

Weatherly, D. B., Atwood, J. A., Minning, T. A., Cavola, C., Tarleton, R. L., and Orlando, R. (2005). A heuristic method for assigning a false-discovery rate for protein identifications from mascot database search results. Mol. Cell. Proteomics 4, 762–772. Wessels, H., Vogel, R. O., van den Heuvel, L., Smeitink, J. A., Rodenburg, R. J., Nijtmans, L. G., and Farhoud, M. H. (2009). LC-MS/MS as an alternative for SDSPAGE in blue native analysis of protein complexes. Proteomics 9, 4221–4228. Winogradsky, S. (1892). Contributions a la morphologie des organismes de la nitrification. Arch. Sci. Biol. 1, 86–137. Wisniewski, J. R., Zougman, A., Nagaraj, N., and Mann, M. (2009). Universal sample preparation method for proteome analysis. Nat. Methods 6, 359–362. Wittig, I., and Schagger, H. (2009). Native electrophoretic techniques to identify proteinprotein interactions. Proteomics 9, 5214–5223. Yates, J. R., Eng, J. K., McCormack, A. L., and Schieltz, D. (1995). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989.