J O U RN A L OF P ROT EO M IC S 7 6 ( 2 01 2 ) 1 4 1 –1 49
Available online at www.sciencedirect.com
www.elsevier.com/locate/jprot
Identifying mutated proteins secreted by colon cancer cell lines using mass spectrometry☆ Suresh Mathivanana, b , Hong Jia, b , Bow J. Tauroa, b , Yuan-Shou Chenb, c , Richard J. Simpsona, b,⁎ a
Department of Biochemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia Ludwig Institute for Cancer Research, Parkville, Victoria 3055, Australia c Core Instrument Center, National Health Research Institutes, Miaoli, Taiwan b
AR TIC LE I N FO Available online 13 July 2012
ABS TR ACT Secreted proteins encoded by mutated genes (mutant proteins) are a particularly rich source of biomarkers being not only components of the cancer secretome but also actually
Keywords:
implicated in tumorigenesis. One of the challenges of proteomics-driven biomarker
Protein mutation
discovery research is that the bulk of secreted mutant proteins cannot be identified directly
Plasma biomarkers
and quantified by mass spectrometry due to the lack of mutated peptide information in
Integrated genomics and proteomics
extant proteomics databases. Here we identify, using an integrated genomics and
CRC specific biomarkers
proteomics strategy (referred to iMASp — identification of Mutated And Secreted proteins), 112 putative mutated tryptic peptides (corresponding to 57 proteins) in the collective secretomes derived from a panel of 18 human colorectal cancer (CRC) cell lines. Central to this iMASp was the creation of Human Protein Mutant Database (HPMD), against which experimentally-derived secretome peptide spectra were searched. Eight of the identified mutated tryptic peptides were confirmed by RT-PCR and cDNA sequencing of RNA extracted from those CRC cells from which the mutation was identified by mass spectrometry. The iMASp technology promises to improve the link between proteomics and genomic mutation data thereby providing an effective tool for targeting tryptic peptides with mutated amino acids as potential cancer biomarker candidates. This article is part of a Special Issue entitled: Integrated omics. © 2012 Elsevier B.V. All rights reserved.
1.
Introduction
Cancer arises as a consequence of accumulated mutations in key proteins that regulate cell proliferation, differentiation and death
[1]. Recent studies on DNA-based sequencing have revealed a complex cancer genome typically with ~40–100 amino acid mutations [2–4]. While the majority of these mutations provide no clonal growth advantage (‘passenger’ mutations), a small
Abbreviations: SNPs, Single Nucleotide Polymorphisms; iMASp, identification of Mutated And Secreted proteins; CRC, colorectal cancer; HPMD, Human Protein Mutant Database; CEA, carcinoembryonic antigen; PSA, prostate-specific antigen; FAP, familial adenomatous polyposis; HNPCC, hereditary nonpolyposis colorectal cancer; APC, adenomatous polyposis coli; SRM, selected reaction monitoring; CM, culture media; LDH, lactate dehydrogenase; ITS, Insulin–Transferrin–Selenium. ☆ This article is part of a Special Issue entitled: Integrated omics. ⁎ Corresponding author at: Department of Biochemistry, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, Victoria 3086, Australia. Tel.: + 61 3 9479 3099; fax: +61 3 9479 1226. E-mail address:
[email protected] (R.J. Simpson). 1874-3919/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jprot.2012.06.031
142
J O U RN A L OF P ROTE O M IC S 7 6 ( 2 01 2 ) 1 4 1 –14 9
fraction are responsible for cancer initiation and tumor progression (‘driver’ mutations) [5,4]. The possibility of accessing such mutated proteins, especially tryptic peptides encoding the mutations, in bodily fluids such as plasma or serum affords an ideal opportunity for biomarker development [6]. Moreover, mutant proteins that are expressed only in cancer cells provide the required specificity that is lacking with the currently available biomarkers such as carcinoembryonic antigen (CEA) or prostatespecific antigen (PSA). In the case of colorectal cancer (CRC), the disease progression is typically accompanied by a series of genetic changes that affect at least one oncogene and several tumor suppressor genes [7]. The majority of CRCs occur sporadically (~70%) with no prior family history [8]; inherited (5–10%) and familial (up to 25%) CRCs make up the remainder. Inherited CRCs result from a single germline mutation and can be divided into two main categories, familial adenomatous polyposis (FAP) and hereditary nonpolyposis colorectal cancer (HNPCC) [8]. In FAP, germline mutations occur in the adenomatous polyposis coli (APC) gene [9] while HNPCC is associated with mutations in proof reading genes coding for mismatch repair enzymes such as MSH2, MLH1, PMS1 and PMS2 [10]. Familial CRC is thought to be associated with genetic polymorphisms such as the I1307K APC mutation found in 6% of all AskenaziJewish population, 28% of them with a family history of CRC [11]. In contrast, sporadic CRC is reported to result from the accumulation of multiple somatic mutations that give rise to adenomas and carcinomas [12]. While approximately 80% of sporadic CRCs have an APC mutation, other mutations are also reported to occur in KRAS, SRC, C-MYC, ERBB2, EGFR, TP53, BRAF [1], PIK3CA, DCC, SMAD4, CTNNB1/β-catenin and TGFBR2 [13]. After decades of investigation on the underlying mechanisms of pathogenesis of cancer, we are now able to systematically analyze the genetic and phenotypic make up of tumor cells. With the advances in high throughput techniques it is now possible to profile CRC patients at the molecular level of the tumor with emphasis on mutated genes. With the prevalence of several gene mutations in CRC, anti-tumor drug development programs focus on targeting tumor-specific mutations. Given the importance of mutations, it is pivotal to systematically catalog such oncogenic mutations in CRC conditions. While mutations can be readily identified at the DNA level, prompting further validations to confirm the expression of the mutated gene at RNA and protein levels [14–16], there is a need to develop a method that can systematically identify mutations at the protein level. Recently, Wang et al. explored the possibility of exploiting mutant KRAS protein as a CRC biomarker [6]. The study quantified somatic mutant responsible for tumorigenesis (KRAS) in CRC and pancreatic tumor tissues by using the mass spectrometry (MS) based technique selected reaction monitoring (SRM) [17]. Additionally, specimens from pancreatic cyst fluids were also assessed for the levels of peptides derived from mutant KRAS. It has to be noted that the study by Wang et al. [6] used an intercellular protein as a biomarker thereby precluding its use as a circulation biomarker for early disease diagnosis. Alternatively, if mutant proteins secreted from the tumor cell are targeted, promising biomarkers with high specificity can be used for early diagnosis and disease progression.
Given the clinical importance of plasma/serum based biomarker discovery for early detection and management of CRC [18–21], we set out to apply a combination of proteomic and bioinformatic tools to map the secretome from 18 cell lines representing different stages of human CRC development. This method, which we have termed identification of Mutated And Secreted proteins (iMASp), allowed identification, for the first time, of mutated secreted proteins using high resolution mass spectrometry. While we have focused on the extracellular proteome, the iMASp method is equally applicable to any proteomic study as well as genome wide associated studies (to discriminate bona fide genomic alterations associated with specific diseases at the protein level).
2.
Materials and methods
2.1.
Cell culture
Human colorectal carcinoma cell lines LIM1215, LIM1863, LIM1899, LIM2099, LIM2405, LIM2408, LIM2537, LIM2550 and LIM2551 [22,23] were from the Ludwig Institute for Cancer Research in Melbourne and routinely cultured in RPMI 1640 medium (Invitrogen) containing 10 μM α-thioglycerol, 25 U/L insulin, and 1 mg/L hydrocortisone. SW1222 and T84 cells were from R. Whitehead and J. K. Heath (Ludwig Institute, Melbourne), respectively, and HCT-15 (ATCC no. CCL-225) cells were cultured in RPMI 1640 medium. MCF-7 (breast cancer cells from H-J. Zhang, Dept. Surgery, Melbourne University), U87 (glioma cells from A. M. Scott, Ludwig Institute at the Austin Hospital, Melbourne), Caco-2 (ATCC no. HTB-37), HCA-7 (ECACC no. 02091238) and LS174T (from S. Stacker, Ludwig Institute, Parkville, Melbourne) cells were cultured in DMEM medium (Invitrogen). LoVo (from S. Stacker) and SW1463 (ATCC no. CCL-234) cells were cultured in Leibovitz's L-15 Medium (Invitrogen). HCT 116 (ATCC no. CCL-247) and HT29 (ATCC no. HTB-38) cells were cultured in McDoy's 5a medium (Invitrogen). All cell culture media were supplemented with 10% FCS (Invitrogen), 60 μg/mL benzylpenicillin and 100 μg/mL streptomycin (Sigma), except in the culture medium for LIM1863 where only 5% FCS was supplemented. All cell lines were cultured at 37 °C and 10% CO2 atmosphere.
2.2.
Preparation of cell culture media (CM)
For generating CM from colon carcinoma and non-CRC cell lines (breast and glioma), approximate 2×106 cells were plated out in 25 mL of the routine culture medium in a 150 mm diameter culture dish (10 dishes per cell line). After cell density reached 70– 80% confluence, the cells were washed twice with the routine culture medium and twice with the phenol red-free routine culture medium. The cells were then cultured in 15 mL of phenol red free culture medium supplemented with 0.8% ITS, 60 μg/mL benzylpenicillin and 100 μg/mL streptomycin for 24 h. The CM were collected, centrifuged at 480 g for 5 min to remove floating cells followed by another centrifugation step at 2000 g for 10 min to remove cell debris. The conditioned medium was concentrated from 50 mL to ~1 mL using an Amicon Ultra-15 centrifugal filter device with 5000 nominal molecular weight limit (Millipore). The concentrated CM were centrifuged for 1 h at 100,000 g to
J O U RN A L OF P ROT EO M IC S 7 6 ( 2 01 2 ) 1 4 1 –1 49
remove secreted membranous vesicles. The resulted CM were stored at −80 °C until further analysis.
2.3.
GeLC–MS/MS
Conditioned media (30 μg) from each cell line were electrophoretically separated using SDS-PAGE and proteins visualized by staining with Imperial Protein Stain (Pierce). Gel lanes were cut into 20 × 2 mm bands using a GridCutter (The Gel Company, San Francisco, CA) and individual bands subjected to in-gel reduction, alkylation and trypsinization, as previously described [24,25]. Briefly, gel bands were reduced with 10 mM DTT (Calbiochem) for 30 min, alkylated for 20 min with 25 mM iodoacetic acid (Fluka), and digested with 150 ng trypsin (Worthington) for 4.5 h at 37 °C. Tryptic peptides were extracted with 50 μL 50% (v/v) acetonitrile, 50 mM ammonium bicarbonate, concentrated to ~ 10 μL by centrifugal lyophilization and analyzed by LC–MS/MS. RP-HPLC was performed on a nanoAcquity® (C18) 150 × 0.15-mm-internal diameter reversed phase UPLC column (Waters) using an Agilent 1200 HPLC coupled online to an LTQ-Orbitrap mass spectrometer equipped with a nanoelectrospray ion source (Thermo Fisher Scientific). The column was developed with a linear 60-min gradient with a flow rate of 0.8 μL/min at 45 °C from 0 to 100% solvent B where solvent A was 0.1% (v/v) aqueous formic acid and solvent B was 0.1% (v/v) aqueous formic acid/60% acetonitrile. Survey MS scans were acquired with the resolution set to a value of 30,000. Real time recalibration was performed using a background ion from ambient air in the C-trap. Up to five of selected target ions were dynamically excluded from further analysis for 3 min.
2.5.
amount of ITS loaded with the analyzed secretome sample) was analyzed by LC–MS/MS and proteins identified in ITS were excluded from the secretome master list.
Lactate dehydrogenase (LDH) assay
LDH assay was performed in triplicate using LDH Toxicology Assay Kit (Sigma) according to the manufacturer's instructions; 1 × 106 cells from each cell line were lysed and serially diluted to generate a standard curve which was used to determine the level of LDH released into the cell culture supernatant by dead cells.
2.4.
143
Database searching and protein identification
Parameters used to generate the peak lists, using extract-msn as part of Bioworks 3.3.1 (Thermo Fisher Scientific), as previously described [26] were as follows: minimum mass 500; maximum mass 5000; grouping tolerance 0.01 Da; intermediate scans 200; minimum group count 1; 10 peaks minimum and TIC of 100. Peak lists for each LC–MS/MS run were merged into a single MGF file. Automatic charge state recognition was used because of the high resolution survey scan (30,000). LC–MS/MS spectra were searched against the NCBI RefSeq protein database [27] in a target-decoy fashion using MASCOT (v2.2.01, Matrix Science, U.K.) and X! Tandem (version 2008.12.01.1). Search parameters used were: fixed modification (carboxymethylation of cysteine; +58 Da), variable modifications (oxidation of methionine; +16 Da), three missed tryptic cleavages, 20 ppm peptide mass tolerance and 0.8 Da fragment ion mass tolerance. Peptide identifications with mascot ion score greater than the identity score (MASCOT) and less than 0.01 e-value (X!Tandem) resulted in less than 1% false discovery rate in all the cell lines. ITS (6 μL, equivalent to the
2.6. Construction of Human Protein Mutant Database (HPMD) Known mutations in human protein sequences were downloaded from UniProt [28], Protein Mutation Database and OMIM [29]. Additionally, human disease related mutations were downloaded from SysPIMP [30]. The known mutations were used to create a database of mutant proteins (31,479 mutations). Customized Perl scripts were used to create protein sequences with known mutations in diseases. Human protein nonredundant database used as mutations were placed at the context of a specific database accession identifier. For instance, UniProt placed the mutation in UniProt sequence identifiers while dbSNP placed the polymorphisms in RefSeq sequence identifiers. A specific mutation (one mutation per peptide) was placed at the center of a peptide with a maximum of 50 amino acids flanking on either side (maximum of 101 amino acids). The peptide will be shorter if the mutation is closer to the C- or N‐termini of the protein. Likewise, SNPs from NCBI dbSNP [31] were downloaded. Among these, missense and nonsense SNPs in the exonic regions were used to create a protein mutant database (140,440 mutations). Customized Perl scripts were used to fix the SNPs at the protein level in human RefSeq proteome. SNP-derived protein mutant database and the known disease mutant database were combined to form the in-house HPMD (171,919 mutations) (Fig. 1). HPMD was further trimmed by removing the protein mutant sequences that are 100% identical with other existing wild type protein sequences in the wild type RefSeq protein database (the mutant sequence is the wild type sequence in another protein) by using BLAST.
2.7.
Mutant protein database searching
LC–MS/MS spectra that did not match any of the peptides in the regular database search (wild type RefSeq proteome database) were separated out using customized Perl scripts. These unmatched LC–MS/MS spectra were searched against the protein mutant database in a target-decoy fashion using X!Tandem (version 2008.12.01.1). Search parameters used were: fixed modification (carboxymethylation of cysteine; +58 Da), variable modifications (oxidation of methionine; +16 Da), three missed tryptic cleavages, 20 ppm peptide mass tolerance and 0.8 Da fragment ion mass tolerance. Peptide identifications with e-value less than 0.01 resulted in less than 2% false discovery rate in all the cell lines. The resulting mutations were validated further in a semiautomatic fashion to shortlist potential mutations. The various steps followed to shortlist the potential mutations by removing mutated peptide hits are as follows: 1. Peptide hits that are also identified in protein additive ITS (ITS was analyzed by LC–MS/MS and the MS data was searched against the mutant database). 2. Peptide hits with mutations resulting from amino acids of the same molecular weight (e.g.: Ile–Leu). 3. Peptide hits with mutations resulting from amino acids with 1 Da difference and high delta value (e.g., Glu–Gln).
144
J O U RN A L OF P ROTE O M IC S 7 6 ( 2 01 2 ) 1 4 1 –14 9
PMD
Exon SNPs missense
Known protein disease mutations
Substitute mutations in protein sequences
nonsense
synonymous and frameshift
dbSNP-derived protein mutations
50 aa
50 aa
P249S
mutation protein start
22 aa 50 aa
50 aa 10 aa
protein end
M23V K458Q
Human protein mutant database (HPMD) Fig. 1 – Construction of Human Protein Mutant Database (HPMD) for MS based protein mutation search. Schematic of the construction of HPMD is shown. Known protein disease mutations downloaded from OMIM, PMD, SysPIMP and UniProt were combined along with the missense and nonsense SNPs from dbSNP. The mutations are substituted in protein sequences to form peptides (maximum 101 amino acids) with mutations. The mutations were fixed to the center (51st residue) unless and until the mutation is localized close to the start or end of the protein sequence. The database composed 171,919 mutations (31,479 — known disease mutations and 140,440 — dbSNP).
Peptide mutations that were detected in more than one cell line or detected by more than 1 MS/MS spectra in the same cell line were filtered further resulting in FDR of less than 0.3%. The short listed potential mutations were further validated by manually interrogating the MS/MS spectra to obtain a final list of potential protein mutations observed in the CRC secretome.
2.8.
RNA extraction and RT-PCR
Total RNA from selected CRC cell lines was extracted using RNeasy® Plus Mini Kit (Qiagen) and the amount of RNA was estimated by spectrophotometric analysis (OD 260). 1 μg of RNA was used to synthesize cDNA using SuperScript™ III First-Strand Synthesis SuperMix for qRT-PCR (Invitrogen), according to the manufacturer's instructions. Synthesized cDNA was then subjected to PCR using high-fidelity Platinum pfx DNA polymerase (Invitrogen) in a GeneAmp PCR system 9700 (Applied Biosystems). PCR products were analyzed by electrophoresis on a 2% (w/v) agarose gel and visualized with ethidium bromide, and submitted to cDNA sequencing service (Australian Genome Research Facility, Melbourne, Australia) using the corresponding sequencing primers (PCR and sequencing primer pairs used are listed in Supplementary Table 1).
3.
Results and discussion
3.1. The colorectal cancer secretome dataset contains 2728 unique proteins A strategy was elaborated to map the CRC secretome using a panel of 18 CRC cell lines, which reflect different stages of the disease and underlying mutations (Supplementary Table 2). The cell lines were grown using supplements (e.g., 0.8% ITS) to minimize cell lysis and culture-related artifacts. Proteins released from tumor cells into the culture media were concentrated, subjected to electrophoretic separation, in-gel trypsinization followed by LC–MS/MS. Derived MS data sets were searched against wild type RefSeq protein database using MASCOT and X!Tandem. The resulting protein identifications were trimmed further and combined to obtain the final protein list. The 18 CRC cell line analyses yielded an average of 981 protein identifications with an average false discovery rate of 0.4% (from a target-decoy database search); >58% of proteins were identified with 2 or more peptides (Table 1). Our large-scale analysis resulted in the high-confidence identification of 17,664 protein entries from 2728 unique proteins (Mathivanan and Simpson, manuscript in preparation).
J O U RN A L OF P ROT EO M IC S 7 6 ( 2 01 2 ) 1 4 1 –1 49
3.2. Mass spectrometry-based protein mutational analysis of secreted proteins (iMASp) As cancers arise as a consequence of accumulated mutations in key proteins that regulate cell proliferation, differentiation and death [1], we were interested in systematically identifying mutated proteins in our CRC secretome dataset. Such mutant secreted proteins, a subset of the ‘cancer secretome’, might be a useful source of biomarkers for disease detection and prognosis. Current techniques to detect mutations are heavily biased toward genomic technologies [14]. To identify mutation at the protein level, we developed iMASp, the first MS-based protein mutation identification method to our knowledge. Because MS-based protein identifications rely on search engines with extant protein databases [32], if the mutated sequence is not present in the database, the MS-based search engine will never detect the mutation. To this end, we created a Human Protein Mutant Database (HPMD) that encompasses both known functional mutations (31,479) and SNPs (140,440 mutations). HPMD is publicly accessible from http://www.exocarta.org/ HPMD. The HPMD (Fig. 1) was constructed with peptides containing the mutated amino acid residue at the center flanked by a maximum of 50 amino acids at each side. Unmatched MS/MS spectra from the secretome datasets (i.e., 1 glioma, 1 breast cancer and 18 CRC cell lines) were used to search the mutant database through X!Tandem without spectra filtering criteria. The non-CRC cell lines (glioma and breast cancer) were used to identify mutations specific to CRC cell lines (mutations identified in both CRC and non-CRC cell lines are not specific to CRC). The obtained peptide hits were filtered and manually validated resulting in 112 peptide identifications (57 unique proteins) (Fig. 2). SignalP [33], PrediSi [34] and Phobius [35] tools were used to predict signal peptides (indicative of classically secreted proteins) in the 57 mutant protein sequences. In addition, the 57 mutant proteins were filtered based on evidence of Table 1 – Total number of proteins identified per number of unique peptides for 18 CRC cell line‐derived secretomes. Cell line
Total number of proteins with 1 unique peptide
Total number of proteins with ≥ 2 unique peptides
Total
FDR
Caco-2 HCA-7 HCT-116 HCT-15 HT-29 LIM1215 LIM1863 LIM1899 LIM2405 LIM2408 LIM2537 LIM2550 LIM2551 LOVO LS174T MCF-7 SW1222 SW1463 T84 U87
441 319 522 297 347 436 404 325 311 570 327 403 514 357 448 565 402 395 426 432
615 415 823 397 379 686 684 419 400 762 481 555 763 387 923 1126 450 720 561 691
1056 734 1345 694 726 1122 1088 744 711 1332 808 958 1277 744 1371 1691 852 1115 987 1123
0.50 0.53 0.38 0.35 0.32 0.37 0.50 0.37 0.40 0.39 0.39 0.35 0.38 0.51 0.42 0.40 0.57 0.36 0.32 0.35
145
their secretion obtained from literature derived databases such as UniProt [28], Ensembl [36], NCBI Entrez gene, ExoCarta [37], Human Proteinpedia [38] and HPRD [39]. This analysis revealed that 97 mutations (53 proteins, 93%) are categorized as secreted.
3.3. Comparison of cell line derived mutations with tumor tissue revealed 3 protein mutations in common We recognize that some of the protein mutations that were detected in our CRC secretome by LC–MS/MS may differ genetically from the primary tissue as a consequence of the cell lines accumulating new mutations during continuous passaging [40]. To identify protein mutations relevant to CRC, we mined the Liebler colon adenocarcinoma tissue-based MS data set [41] (obtained from National Cancer Institute) to establish whether any of the 112 protein mutations we found in our CRC secretome are present in that primary tissue MS data set. Encouragingly, even though the Leibler data set was derived from only one patient, we identified 3 hitherto unrecognized protein mutations that are common to our CRC secretome data set, namely, mitochondrial enoyl CoA hydratase (ECHS1), keratin 10 (KRT10) and tubulin alpha 3c (TUBA3C). Given that we compared data from a single colon adenocarcinoma tissue homogenate (enriched with intracellular proteins) with our CRC secretome (enriched with secreted proteins), the 3 mutations found in common by iMASp reveal the efficacy of the method for identifying mutations at the protein level by large scale LC–MS/MS. In this respect, the application of the iMASp method to data from primary cancer tissue biopsies has the potential to significantly accelerate the rate of identification of disease associated mutations at the protein level.
3.4. RT-PCR and cDNA sequencing of selected peptides confirm the mutation at nucleotide level The 112 mutations (in 57 unique proteins) that we detected by using iMASp in our CRC secretome data set are novel in the context of CRC (Fig. 2). Even though many of these mutations have been defined in other disease settings or reported in the SNP database, this is the first recognition of their presence in the context of CRC. The functional consequences of these mutations at the context of CRC are yet to be unraveled. Proteins with identified mutations include GDF15, ACTG1, ALCAM, ANXA2, APEX1, CAPG, CAST, CLCN1, HSPG2 and KLK10 (Supplementary Table 3). Among the 112 mutations identified, 8 were selected for further validation (Table 2). RT-PCR and cDNA sequencing confirmed the presence of the mutant form in the appropriate CRC cell lines (100%). For example, the A25T mutation in CST3 (21VSPATGSSPGKPPR34) which was detected at the protein level (i.e., at the tryptic peptide level by LC–MS/MS) in LIM1899 and T84 cells was confirmed by RT-PCR and cDNA sequencing in both cell lines (Fig. 3). The corresponding wild type peptide in CST3 (VSPAAGSSPGKPPR) was detected in 7 CRC cell line secretomes (Caco-2, HCA-7, HCT-116, HT-29, LIM2405, LS174T and SW1463) and the non-CRC U87 cell line secretome by MS. Clearly, the presence of mutated proteins in this CRC secretome study, especially those with missense mutations, presents a hitherto untapped source of potential CRC biomarkers warranting further studies using evolving proteomics technologies such as selective reaction monitoring [17].
146
J O U RN A L OF P ROTE O M IC S 7 6 ( 2 01 2 ) 1 4 1 –14 9
p
AS
iM
Secretome
LC-MS/MS Wild type database search Mutant database search
Mascot
Human RefSeq proteome database
Human protein mutant database (HPMD)
X!Tandem X!Tandem
Identified MS/MS spectra
Unmatched MS/MS spectra
112 mutations (57 proteins)
2728 proteins
Check for bona fide secreted proteins Literature direved databases (HPRD, Entrez Gene, Ensembl, ExoCarta, Human Proteinpedia, UniProt) Signal peptide bioinfomatics prediction (Phobhius, SignalP)
97 mutations (53 proteins, 93%)
2249 proteins (82%)
Fig. 2 – Schematic of iMASp. Schematic of iMASp method is displayed. CRC secretome MS/MS spectra are searched against wild type RefSeq human proteome. 2728 proteins were identified from the CRC secretome dataset. Unmatched LC–MS/MS spectra from the CRC secretome cell lines were searched with X!Tandem search engine employing the HPMD. Identified peptides were trimmed further and manually validated to obtain the final list of 112 peptides with mutations resulting from 57 proteins. Interrogation of soluble secreted proteins resulted in the identification of 53 mutant proteins as being bona fide secreted proteins.
Table 2 – Mutations identified in CRC cell line secretome.
1 2 3 4 5 6 7 8
Protein accession
Mutation
Gene symbol
Colorectal tissue
Glioma/breast cancer
RT-PCR and cDNA sequencing confirmation
NP_001618.2 NP_000079.2 NP_000090.1 NP_005551.3 NP_005579.2 NP_036533.2 NP_002817.2 NP_005992.1
M301T G1022S A25T F1807S R518Q K318E G200A V75L
ALCAM COL1A1 CST3 LAMA5 MEP1A PLXNB2 QSOX1 TUBA3C
– – – – – – – Present
– – – – – – – MCF7
HCA-7,T84 HCA-7 LIM1899,T84 LIM2405,LIM2408,LIM2537,SW1222 LIM1863 LIM1899,T84 HCT-116,SW1463 LOVO
147
J O U RN A L OF P ROT EO M IC S 7 6 ( 2 01 2 ) 1 4 1 –1 49
Unmatched CRC secretome LC MS/MS spectra X!Tandem search
CRC secretome LC MS/MS spectra - X!Tandem search
Wild type RefSeq database
Human protein mutant database
CST3 wild type
CST3 mutant (A25T)
Caco-2,HCA-7,HCT-116, HT-29, LIM2405, LS174T,SW1463
LIM1899,T84
MS/MS spectra of observed peptide in SW1463
MS/MS spectra of observed peptide in LIM1899
VSPAAGSSPGKPPR
VSPATGSSPGKPPR
RT-PCR and cDNA sequencing
(bp) 500
M
LIM 18 T8 99 4
cDNA sequencing LIM1899 (GCC -> ACC)
347 bp
100 Fig. 3 – Validation of protein mutation obtained by iMASp. Validation of mutation by RT-PCR and cDNA sequencing is shown. iMASp resulted in identification of a mutation in CST3 (A25T) by LC–MS/MS. The wild type peptide (VSPAAGSSPGKPPR) was detected in Caco-2, HCA-7, HCT-116, HT-29, LIM2405, LS174T, SW1463 and U87 secretomes while the mutated peptide (VSPATGSSPGKPPR) was detected in LIM1899 and T84 secretomes. MS/MS spectra of the wild type and mutated peptides are shown from SW1463 and LIM1899 cell line secretomes, respectively. RT-PCR and cDNA sequencing confirmed the mutation in the CST3 mRNA (GCC → ACC). Autoradiographs of the PCR product and cDNA sequencing profile of the PCR product are displayed.
4.
Conclusions
Further studies are needed to categorize mutations identified in this study as driver and passenger mutations [5,4]. Overall, the MS-based protein mutation discovery strategy we describe serves as a method for identifying potential biomarkers that can be used for blood-based diagnostic and prognostic purposes, especially when coupled with non-antibody based quantitative assays (SRM) for derived mutant tryptic peptides. Given that MS based datasets obtained from various cell lines, disease tissues and body fluids are populated in public repositories (~12 TB of data) [42–44], the MS-based mutational strategy that we have developed, iMASp, can be adapted to re-search such MS datasets to uncover hitherto undiscovered protein mutations. In addition to
proteomic studies, the iMASp method can be targeted equally to genome wide associated studies where the need to discriminate true positive genomic alterations from false positives presents a major technical challenge. Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.jprot.2012.06.031.
Acknowledgments The authors wish to thank Donna Dorow for her editorial assistance and insightful suggestions. This work was supported by the National Health and Medical Research Council (NH&MRC Program grant 487922 to RJS) and NH&MRC Fellowship #1016599 to SM. We thank the Australian Cancer Research Foundation
148
J O U RN A L OF P ROTE O M IC S 7 6 ( 2 01 2 ) 1 4 1 –14 9
for the purchase of the LTQ-Orbitrap mass spectrometer used in this study.
REFERENCES
[1] Davies H, Bignell GR, Cox C, Stephens P, Edkins S, Clegg S, et al. Mutations of the BRAF gene in human cancer. Nature 2002;417:949–54. [2] Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, Leary RJ, et al. The genomic landscapes of human breast and colorectal cancers. Science 2007;318:1108–13. [3] Teschendorff AE, Caldas C. The breast cancer somatic ‘muta-ome’: tackling the complexity. Breast Cancer Res 2009;11:301. [4] Bozic I, Antal T, Ohtsuki H, Carter H, Kim D, Chen S, et al. Accumulation of driver and passenger mutations during tumor progression. Proc Natl Acad Sci U S A 2010;107: 18545–50. [5] Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature 2009;458:719–24. [6] Wang Q, Chaerkady R, Wu J, Hwang HJ, Papadopoulos N, Kopelovich L, et al. Mutant proteins as cancer-specific biomarkers. Proc Natl Acad Sci U S A 2011;108:2444–9. [7] Ma PC, Zhang X, Wang ZJ. High-throughput mutational analysis of the human cancer genome. Pharmacogenomics 2006;7:597–612. [8] Calvert PM, Frucht H. The genetics of colorectal cancer. Ann Intern Med 2002;137:603–12. [9] Nakamura Y, Nishisho I, Kinzler KW, Vogelstein B, Miyoshi Y, Miki Y, et al. Mutations of the adenomatous polyposis coli gene in familial polyposis coli patients and sporadic colorectal tumors. Princess Takamatsu Symp 1991;22:285–92. [10] Wheeler JM, Bodmer WF, Mortensen NJ. DNA mismatch repair genes and colorectal cancer. Gut 2000;47:148–53. [11] Laken SJ, Petersen GM, Gruber SB, Oddoux C, Ostrer H, Giardiello FM, et al. Familial colorectal cancer in Ashkenazim due to a hypermutable tract in APC. Nat Genet 1997;17:79–83. [12] Fearon ER, Vogelstein B. A genetic model for colorectal tumorigenesis. Cell 1990;61:759–67. [13] Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res 2011;39: D945–50. [14] Strom CM. Mutation detection, interpretation, and applications in the clinical laboratory setting. Mutat Res 2005;573:160–7. [15] Wang Z, Shen D, Parsons DW, Bardelli A, Sager J, Szabo S, et al. Mutational analysis of the tyrosine phosphatome in colorectal cancers. Science 2004;304:1164–6. [16] Bardelli A, Parsons DW, Silliman N, Ptak J, Szabo S, Saha S, et al. Mutational analysis of the tyrosine kinome in colorectal cancers. Science 2003;300:949. [17] Picotti P, Rinner O, Stallmach R, Dautel F, Farrah T, Domon B, et al. High-throughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat Methods 2010;7:43–6. [18] Klein-Scory S, Kubler S, Diehl H, Eilert-Micus C, Reinacher-Schick A, Stuhler K, et al. Immunoscreening of the extracellular proteome of colorectal cancer cells. BMC Cancer 2010;10:70. [19] Pavlou MP, Diamandis EP. The cancer cell secretome: a good source for discovering biomarkers? J Proteomics 2010;10:1896–906. [20] Wu CC, Chen HC, Chen SJ, Liu HP, Hsieh YY, Yu CJ, et al. Identification of collapsin response mediator protein-2 as a potential marker of colorectal carcinoma by comparative analysis of cancer cell secretomes. Proteomics 2008;8:316–32.
[21] Xue H, Lu B, Zhang J, Wu M, Huang Q, Wu Q, et al. Identification of serum biomarkers for colorectal cancer metastasis using a differential secretome approach. J Proteome Res 2010;9:545–55. [22] Zhang HH, Walker F, Kiflemariam S, Whitehead RH, Williams D, Phillips WA, et al. Selective inhibition of proliferation in colorectal carcinoma cell lines expressing mutant APC or activated B-Raf. Int J Cancer 2009;125:297–307. [23] Whitehead RH, Zhang HH, Hayward IP. Retention of tissue-specific phenotype in a panel of colon carcinoma cell lines: relationship to clinical correlates. Immunol Cell Biol 1992;70(Pt 4):227–36. [24] Simpson RJ, Connolly LM, Eddes JS, Pereira JJ, Moritz RL, Reid GE. Proteomic analysis of the human colon carcinoma cell line (LIM 1215): development of a membrane protein database. Electrophoresis 2000;21:1707–32. [25] Mathivanan S, Lim JW, Tauro BJ, Ji H, Moritz RL, Simpson RJ. Proteomics analysis of A33 immunoaffinity-purified exosomes released from the human colon tumor cell line LIM1215 reveals a tissue-specific protein signature. Mol Cell Proteomics 2010;9:197–208. [26] Chen YS, Mathias RA, Mathivanan S, Kapp EA, Moritz RL, Zhu HJ, et al. Proteomics profiling of Madin–Darby canine kidney plasma membranes reveals Wnt-5a involvement during oncogenic H-Ras/TGF-beta-mediated epithelial-mesenchymal transition. Mol Cell Proteomics 2011;10 (M110.001131). [27] Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2007;35: D61–5. [28] UniProt-Consortium. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res 2010;38:D142–8. [29] Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res 2009;37:D793–6. [30] Xi H, Park J, Ding G, Lee YH, Li Y. SysPIMP: the web-based systematical platform for identifying human disease-related mutated sequences from mass spectrometry. Nucleic Acids Res 2009;37:D913–20. [31] Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001;29:308–11. [32] Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature 2003;422:198–207. [33] Bendtsen JD, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 2004;340: 783–95. [34] Hiller K, Grote A, Scheer M, Munch R, Jahn D. PrediSi: prediction of signal peptides and their cleavage positions. Nucleic Acids Res 2004;32:W375–9. [35] Kall L, Krogh A, Sonnhammer EL. A combined transmembrane topology and signal peptide prediction method. J Mol Biol 2004;338:1027–36. [36] Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, et al. BioMart—biological queries made easy. BMC Genomics 2009;10:22. [37] Mathivanan S, Fahner CJ, Reid GE, Simpson RJ. ExoCarta 2012: database of exosomal proteins, RNA and lipids. Nucleic Acids Res 2012;40:D1241–4. [38] Mathivanan S, Ahmed M, Ahn NG, Alexandre H, Amanchy R, Andrews PC, et al. Human Proteinpedia enables sharing of human protein data. Nat Biotechnol 2008;26:164–7. [39] Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, et al. Human protein reference database—2006 update. Nucleic Acids Res 2006;34:D411–4. [40] Borrell B. How accurate are cancer cell lines? Nature 2010;463:858. [41] Slebos RJ, Brock JW, Winters NF, Stuart SR, Martinez MA, Li M, et al. Evaluation of strong cation exchange versus isoelectric focusing of peptides for multidimensional liquid
J O U RN A L OF P ROT EO M IC S 7 6 ( 2 01 2 ) 1 4 1 –1 49
chromatography–tandem mass spectrometry. J Proteome Res 2008;7:5286–94. [42] Hill JA, Smith BE, Papoulias PG, Andrews PC. ProteomeCommons.org collaborative annotation and project management resource integrated with the tranche repository. J Proteome Res 2010;6:2809–11.
149
[43] Mathivanan S, Pandey A. Human Proteinpedia as a resource for clinical proteomics. Mol Cell Proteomics 2008;7:2038–47. [44] Deutsch EW. The PeptideAtlas Project. Methods Mol Biol 2010;604:285–96.