Accepted Manuscript Integration of quantitative proteomics data and interaction networks: identification of dysregulated cellular functions during cancer progression Andreas Zanzoni, Christine Brun PII: DOI: Reference:
S1046-2023(15)30089-X http://dx.doi.org/10.1016/j.ymeth.2015.09.014 YMETH 3795
To appear in:
Methods
Received Date: Revised Date: Accepted Date:
14 June 2015 2 September 2015 14 September 2015
Please cite this article as: A. Zanzoni, C. Brun, Integration of quantitative proteomics data and interaction networks: identification of dysregulated cellular functions during cancer progression, Methods (2015), doi: http://dx.doi.org/ 10.1016/j.ymeth.2015.09.014
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1
Integration of quantitative proteomics data and interaction
2
networks: identification of dysregulated cellular functions
3
during cancer progression
4 Andreas Zanzoni1,2 and Christine Brun1,2,3,*
5 6
1
7
UMR_S1090 TAGC, Marseille, F-13288, France, CNRS, Marseille, F-13402, France.
Inserm, UMR_S1090 TAGC, Marseille, F-13288, France,
2
Aix-Marseille Université,
3
8 9
* Correspondence: Dr. Christine Brun, TAGC UMR_S1090, Inserm, Aix-Marseille Université,
10
Marseille, France. Tel: + 33 4 91 82 87 12. Fax: +33 4 91 82 87 01. E-mail:
[email protected]
11
mrs.fr.
12 13 14 15 16 17 18 19 20 21 22 23 24
1 2
Abstract
3
Quantitative proteomics allows the characterization of molecular changes between
4
healthy and disease states. To interpret such datasets, their integration to the
5
protein-protein interaction network provides a more comprehensive understanding of
6
cellular function dysregulation in diseases than just considering lists of dysregulated
7
proteins. Here, we propose a novel computational method, which combines protein
8
interaction network and statistical analyses to establish expression profiles at the
9
network module level rather than at the individual protein level, and to detect and
10
characterize dysregulated network modules through different stages of cancer
11
progression. We applied our approach to two publicly available datasets as case
12
studies.
13 14
Keywords: bioinformatics, quantitative proteomics, protein interaction, network
15
modules, cancer progression
16 17
1. Introduction
18 19
High-throughput technologies such as gene expression profiling have permitted the
20
characterization of molecular changes between healthy and disease states. They
21
have led to the identification of distinct phenotypic classes and stages, crucial for
22
therapeutic intervention [1–5]. Notwithstanding these, the applicability in clinical
23
practice of the resulting plethora of prognostic signatures is still limited [6].
24 25
To identify molecular perturbations that might be implicated in disease, integrating
26
protein interaction with omics data such as gene expression profiles can help. For
27
this, several computational methods have been developed [7], some of which have
28
been applied in cancer biology to discover novel genes related to B-cell lymphomas
29
[8] and to improve breast cancer classification [9]. However, these approaches rely
30
on the assumption that there is a correlation between the expression of a gene and
31
its corresponding protein(s), although many mechanisms may uncouple transcription
32
from translation [10]. Thus, considering protein levels is more appropriate since they
33
reflect the cell phenotype more precisely [11]. Indeed, recent reports illustrate the
34
benefit of integrating protein expression and interaction data to predict, for instance,
35
novel functions for the G protein‐coupled receptor rhodopsin in photoreceptor cells
36
[12] or to identify the cellular processes involved in cell cycle entry in human T
2
1
lymphocytes [13]. Furthermore, current quantitative proteomics technologies enable
2
the identification and the quantification of thousands of proteins and are paving the
3
way to a systems-wide analysis of proteomes [14]. This is particularly relevant for the
4
study of human diseases such as cancer, in which cell proteomes are reshaped
5
during disease progression leading to or resulting from the dysregulation of cellular
6
processes. As the proteomic landscapes of cancer cell lines and tumor samples at
7
different stages are becoming available [15,16], it is thus crucial to develop
8
computational network-based approaches to analyze high-resolution cancer
9
proteomics data.
10 11
To this aim, we present in this work a computational network-based framework to
12
identify significantly dysregulated cellular functions during disease progression by
13
combining quantitative proteomics data with protein interaction analysis. Proteins
14
seldom act alone but rather perform their functions by interacting with other proteins
15
in macromolecular complexes, metabolic and signaling pathways [17,18].
16
Considering these cellular functional units rather than the proteins alone should led
17
to a better grasp on the expression dysregulation leading to disease. We therefore
18
propose a “functional modules” approach as these provide a formal framework to
19
investigate and infer protein cellular functions as well as their possible perturbations.
20
Indeed, functional modules can be identified from protein interaction networks using
21
tailored computational methods (reviewed in [19]). Based on a guilt-by-association
22
principle, they allow the cellular function inference of the uncharacterized protein
23
components. Furthermore, they can contain several interacting disease-related
24
proteins [20,21] and alterations in one or more of these players may propagate to
25
other components, eventually leading to cell network perturbations and disease
26
phenotypes [22–24]. Integrating protein expression at this level of organization of the
27
protein network thus appears particularly relevant for a better understanding of
28
cellular function dysregulation in diseases.
29 30
Here, we propose a novel computational method, which combines protein interaction
31
network and statistical analysis to interpret quantitative proteomics data. Overall, the
32
strength of our approach is four-fold: (i) it provides an integrated cellular context by
33
taking into account the expression profile at the network module level; (ii) it exploits
34
protein expression values, which better describe cell phenotypes; (iii) it detects
35
statistically dysregulated network modules through different stages of cancer
36
progression; (iv) based on their participation in these modules, those members for
37
which no proteomic data is available, represent candidate cancer proteins, possibly
3
1
implicated in cancer progression, and representing potential biomarkers or
2
therapeutic targets. We applied our approach to two public available datasets as
3
case studies: a panel of 11 cell lines recapitulating breast cancer progression [15]
4
and a set of 155 normal and tumor colorectal samples taken from the Clinical
5
Proteomic Tumor Analysis Consortium (CPTAC) data portal [25].
6 7
2. Materials and Methods
8 9
2.1 Overview
10
Our approach consists in the following steps (Figure 1): (i) generation of a human
11
binary protein interaction network by gathering publicly available data; (ii) detection of
12
network modules using the OCG (Overlapping Cluster Generator) algorithm [26]; (iii)
13
functional annotation of the network modules; (iv) integration of quantitative
14
proteomics data; (v) identification of network modules showing a statistically
15
significant change in protein expression during progression, using non-parametric
16
analysis of variance methods. Each step is described in details in the following sub-
17
sections.
18 19
2.2 Interactome construction
20
We used the human interactome that we built and described previously [27]. Briefly,
21
protein interaction data have been retrieved (February 2013) from several databases
22
[28–36] through the PSICQUIC query interface [37]. Only binary interactions likely to
23
be direct according to the experimental detection method [38,39] have been kept. To
24
reduce the redundancy among TrEMBL and SwissProt protein entries, the
25
sequences have been clustered using CD-HIT [40] and TrEMBL/SwissProt protein
26
pairs sharing at least 95% sequence similarity were considered to be the same
27
protein. Interactions assigned to the TrEMBL entry were then transferred to the
28
SwissProt entry. In this way, a human binary interactome containing 74,388
29
interactions between 12,865 proteins have been obtained (Supplementary Table S1).
30 31
2.3 Network module identification and annotation
32
We identified network modules using the OCG algorithm with default parameters [26]
33
obtaining a collection of 855 modules ranging in size from 2 to 306 proteins
34
(Supplementary Table S2). This algorithm decomposes a network into overlapping
35
modules, based on modularity optimization [26].
4
1
We functionally annotated these network modules by assessing the over-
2
representation of Gene Ontology (GO) biological process terms [41] (downloaded
3
from the GOA website [42], April 2013) and cellular pathways taken from KEGG [43],
4
Reactome [44] and NCI-PID [45] databases (downloaded from the MSigDB [46], April
5
2013). We considered only GO terms and pathways having at least 5, and no more
6
than 500, annotated proteins in the human proteome. Enrichment P-values were
7
computed using the Fisher’s exact test (one-sided) and corrected for multiple testing
8
with the Benjamini-Hochberg procedure (significance threshold alpha=1x10-3). As
9
background reference, we used the annotated proteins in the human interactome.
10
Finally, to maximize the annotation coverage, we further characterized network
11
modules by assessing their functional homogeneity based on GO biological process
12
terms as described in [27]. Briefly, a GO term is assigned to a network module if at
13
least 50% of its proteins share that GO term.
14 15
2.4 Proteomics data processing and integration
16
Our approach takes as input protein expression data matrices (see Supplementary
17
Data A). Only proteins present in the human interactome are considered. To estimate
18
the expression across the different stages, we computed a Z-score of the expression
19
values for each protein i as follows:
20 , =
, − μ σ
21 22
where , is the expression value of the protein i in the stage j, Zi,j is the Z-score of
23
the protein i in the stage j, µ i and σi are the mean expression value and the standard
24
deviation of the protein i across the stages, respectively. Finally, the set of the Z-
25
scores of all individual proteins belonging to the same network modules are
26
assembled as an expression distribution for every stage, therefore generating an
27
expression profile for every module during progression (Figure 1).
28 29
2.5 Module dysregulation assessment
30
Network modules containing at least 5 proteins and having quantitative expression
31
for at least 50% of their components available were selected for statistical analysis.
32
We performed three non-parametric tests to assess module dysregulation. All the
33
tests were performed in the R statistical environment and obtained P-values were
34
corrected for multiple testing with the Benjamini-Hochberg procedure.
35
5
1
2.5.1 Identification of dysregulated modules
2
We first applied the Kruskal-Wallis (KW), a non-parametric analysis of variance
3
method, which compares for every module its expression distributions (one for each
4
stage of progression). The KW assesses whether at least one distribution
5
“dominates” the others, meaning that it differs significantly from the other
6
distributions. In this case, the given module was considered as significantly
7
dysregulated when the corrected P-value was smaller than 0.05 (two-sided KW test).
8 9
2.5.2 Dysregulated modules with increasing and decreasing expression trends
10
across stages
11
We used the one-sided Jonckheere's Trend (JT) method (R package: SAGx) to
12
detect dysregulated modules (among those found by the KW test) with significantly
13
increasing or decreasing expression. To do so, the JT test takes into account the
14
ordering of the stages, that is from the initial (e.g., normal) to the most advanced one
15
(e.g., metastasis). We considered a trend significant when the corrected P-value was
16
smaller than 0.025.
17 18
2.5.3 Stage-specific dysregulated modules during progression
19
To overcome the limitation of the KW test on the detection of dominant stage(s), if
20
any (the KW test does not pinpoint the significantly different distribution(s) per
21
profile), we applied the post-hoc Dunn Test (DT) (R package: dunn.test). The DT
22
method performs multiple pairwise comparisons between the expression distributions
23
of a given module. We defined a module as dysregulated in a stage-specific manner
24
if: (i) the expression distribution of a given stage is significantly greater compared to
25
at least 75% of the other stages by the DT test (corrected P-value<0.05); (ii) it does
26
not show any significant dysregulation trend detected by the JT test.
27 28
3. Results and Discussion
29 30
We have applied the proposed method to two quantitative proteomic datasets. First,
31
the proteomes of 11 cell lines recapitulating the ER(-) breast tumor transformation
32
process. These have been generated by SILAC-based proteomics profiling [15] and
33
contain 7,800 proteins. Second, the proteomic profiling of 3,899 genes across a set
34
of 155 normal and tumor colorectal samples taken from the Clinical Proteomic Tumor
35
Analysis Consortium (CPTAC) data portal [25].
36
6
1
3.1 Up to a half of the human network modules are dysregulated during cancer
2
progression
3
The breast cancer progression (BC) dataset covers 38% of the proteins contained in
4
the human interactome. This allowed us to analyze 414 network modules that fulfill
5
our selection criteria (Section 2.3), i.e., roughly 50% of the 819 interactome modules
6
formed by at least 5 proteins. According to the Kruskal-Wallis test, we found that 138
7
modules, containing a total of 2985 proteins, were significantly dysregulated (Figure
8
2, Supplementary Table S3) across cancer stages. These modules represent 33% of
9
the 414 analyzed modules and 17% of all 819 interactome modules.
10 11
Similarly, using the colorectal cancer (CRC) proteomics dataset (that covers 29% of
12
the interactome proteins), we analyzed 157 network modules complying with the
13
criteria described in Section 2.3 (i.e., 19% of the interactome module). Seventy-
14
seven of them (accounting for 49% and 9,5% of the studied and of all interactome
15
modules, respectively) were found significantly dysregulated according to the KW
16
test (Figure 2, Supplementary Table S4).
17 18
Overall, these results show that for at least 33 to 49% of the investigated functional
19
modules, the global protein expression differs significantly at one or several cancer
20
progression stages, compared to the others. From a functional perspective, in both
21
datasets, these dysregulated modules are mainly involved in signaling, cell cycle and
22
transcriptional regulation (Supplementary Table S5 and S6).
23 24
The fact that a higher proportion of dysregulated modules are detected when using
25
CRC compared to BC data (49% vs. 33%) possibly reflects the heterogeneity of
26
tumor samples compared to cell lines. However, we cannot exclude that this
27
difference is due to the extent of the interactome coverage by the proteomic dataset.
28 29
3.2 The increasing and the decreasing expression tendency of dysregulated
30
modules.
31
In the BC dataset, by applying the Jonckheere's Trend test to the KW dysregulated
32
modules, we detected a significant increasing expression for 71 modules (51%)
33
across the stages, from normal to metastatic, whereas 18 (13%) showed a
34
decreasing expression during progression. The remaining 49 dysregulated modules
35
did not show any significant dysregulation trend. Whereas the modules with
36
increasing expression are involved in sustained proliferation such as cell cycle, DNA
37
replication, recombination and repair, mRNA splicing, transcription regulation, and
7
1
several signaling pathways (e.g., events mediated by Notch, SMAD2/3 and P53), the
2
modules with the opposite trend reveal a decrease in the expression of proteins
3
acting in focal adhesion, regulation of actin cytoskeleton, extracellular matrix receptor
4
interactions and cell surface receptor- and EGFR signaling (Supplementary Data B.1
5
and B.2, Supplementary Table S3 and S5). The decreased expression of these
6
cellular functions involved in the maintenance of tissue integrity, is an hallmark of
7
cancer development leading to metastasis [47].
8 9
In the CRC dataset, the JT test detected for 19 (24,6%) and 5 (6,5%) modules a
10
significant increasing or decreasing expression, respectively. Here again, the
11
increasing trend is associated with cell proliferation functions such as mRNA splicing
12
and spliceosome, DNA repair and Notch and SMAD2/3 signaling. On the other hand,
13
modules displaying a decreasing trend are annotated with functions related to cell
14
cycle arrest, protein degradation via the proteasome, and antigen
15
processing/presentation, the latter being related to immune surveillance evasion
16
(Supplementary Data B.3 and B.4, Supplementary Table S4 and S6), an emerging
17
cancer hallmark [47].
18 19
Interestingly, 29 modules of the interactome are found dysregulated using both
20
proteomic datasets (Supplementary Table S3 and S4). Eighty percent of these
21
(23/29) show the same trend according to the JT test, among which 39% (i.e., 9/23)
22
reach statistical significance. These modules denote biological processes commonly
23
dysregulated in both cancers, essentially RNA splicing and DNA damage response
24
that are increasing with cancer progression, whereas protein degradation is
25
decreasing (see section 3.4). Analogously, the identification of modules specifically
26
dysregulated solely in one cancer could be highly informative; however, it might be
27
influenced by the difference between the proteomes’ coverage and may be due to a
28
lack of data for the proteins of a module in one of the two datasets.
29 30
3.3 Two types of stage-specific modules
31
We have identified the specific stage(s) showing the most significant changes in
32
expression among dysregulated modules without any trend using the Dunn test (see
33
Section 2.5.3). Twenty-six percent and 47% showed a significant higher expression
34
in one or two specific stages for BC and CRC respectively (Figure 2, Supplementary
35
Table S3 and S4). Notably, the highest significant expression is detected in Stage II
36
for the majority of those modules (9/13, i.e., 69%, and 23/25, i.e., 92%, for BC and
37
CRC, respectively). This corresponds to an expression burst, which is not maintained
8
1
through the following stages in most of the cases (for instance, see modules 319 and
2
386 in Supplementary Figure S6). Modules in this category are involved in signaling,
3
regulation of actin cytoskeleton, focal adhesion and extracellular matrix receptor
4
interaction.
5 6
Conversely, a significantly lower expression in one or two stages is revealed in
7
32,6% and 39,6% of the dysregulated modules for BC and CRC, respectively (Figure
8
2, Supplementary Table). Very interestingly, for most of the dysregulated modules,
9
the normal stage is the lowest expressed one, whereas the other stages have higher
10
but similar and even expression distributions (see for example, Modules 201 and 309
11
in Supplementary Figure S6). This corresponds to a sharp increase in expression in
12
these modules in the premalignant stage, which is further maintained through cancer
13
progression. These modules are dedicated to mRNA splicing, DNA replication and
14
particularly to the toll-like receptor pathways, which are related to tumor-promoting
15
inflammation [48].
16 17
Overall, using the Dunn test allowed us to distinguish cellular functions dysregulated
18
in one particular stage from those dysregulated as soon as Stage I, a fact persisting
19
throughout cancer progression.
20 21
3.4 An example of dysregulated module in both BC and CRC
22
In the colorectal dataset, 4 network modules related to the proteasome show a
23
significantly decreasing trend during progression (see Supplementary Table S4).
24
One of these (i.e. Module 780) shows the same significant trend also in breast
25
cancer (Figure 3). Cancer cells exploit the ubiquitin-proteasome systems for their
26
growth [49]. For this reason, specific inhibitors have been developed to target the
27
proteasome machinery in different cancer types [50–52]. However, reports
28
highlighted that low proteasome activity (i) is a distinct feature of cancer cells with a
29
high self-renewal capacity and a stem-like phenotype [53–57] and (ii) has been
30
associated with decreased survival in head and neck cancer patients treated with
31
radiotherapy [58]. Our results are consistent with this scenario: tumor samples in the
32
CRC dataset come from patients that received radiotherapy treatment [25] and low
33
proteasome activity has been recently observed in sub-population of certain breast
34
cancer cell lines [55].
35 36 37
9
1
4. Conclusions
2 3
We have proposed a novel computational method for integrating quantitative
4
proteomics and protein interaction data, based on network clustering and statistical
5
analyses. We have shown that our approach is able to identify relevant dysregulated
6
cellular functions involved in cancer progression. The method is modular in its
7
conception and can be adapted to any network. Furthermore, it does not depend on
8
the network clustering algorithm used to identify functional modules. More
9
importantly, the method is generic because any proteomic dataset measuring
10
changes during the progression of a phenomenon of interest, like disease or cancer
11
progression or infection kinetics can be studied. In particular, proteomic data
12
obtained from a single patient over time could also be analyzed with our approach in
13
a personalized medicine setting.
14 15 16
Acknowledgements
17
The authors received financial support from the French “Plan Cancer 2009-2013”
18
(Systems Biology call, A12171AS). The authors thank Anaïs Baudot and Elisabeth
19
Remy (I2M, CNRS, Marseille), Lionel Spinelli (TAGC, Aix-Marseille University,
20
Marseille), Luc Camoin (CRCM, Inserm, Marseille) and all the partners of the
21
Hsp27BioSys project for fruitful discussion.
22 23
References
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
[1] T. Sorlie, C.M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, et al., Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications., Proc. Natl. Acad. Sci. U. S. A. 98 (2001) 10869–74. [2] T. Sorlie, R. Tibshirani, J. Parker, T. Hastie, J.S. Marron, A. Nobel, et al., Repeated observation of breast tumor subtypes in independent gene expression data sets., Proc. Natl. Acad. Sci. U. S. A. 100 (2003) 8418–23. [3] J. Lapointe, C. Li, J.P. Higgins, M. van de Rijn, E. Bair, K. Montgomery, et al., Gene expression profiling identifies clinically relevant subtypes of prostate cancer., Proc. Natl. Acad. Sci. U. S. A. 101 (2004) 811–6. doi:10.1073/pnas.0304146101. [4] Y. Wang, J.G.M. Klijn, Y. Zhang, A.M. Sieuwerts, M.P. Look, F. Yang, et al., Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer., Lancet. 365 (2005) 671–9. [5] A. Naderi, A.E. Teschendorff, N.L. Barbosa-Morais, S.E. Pinder, A.R. Green, D.G. Powe, et al., A gene-expression signature to predict survival in breast cancer across independent data sets., Oncogene. 26 (2007) 1507–16. [6] T. Iwamoto, L. Pusztai, Predicting prognosis of breast cancer with gene signatures: are we lost in a sea of data?, Genome Med. 2 (2010) 81. [7] L.I. Furlong, Human diseases through the lens of network biology., Trends Genet. TIG. 29 (2013) 150–9. doi:10.1016/j.tig.2012.11.004.
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
[8] K.M. Mani, C. Lefebvre, K. Wang, W.K. Lim, K. Basso, R. Dalla-Favera, et al., A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas., Mol. Syst. Biol. 4 (2008) 169. doi:10.1038/msb.2008.2. [9] H.-Y. Chuang, E. Lee, Y.-T. Liu, D. Lee, T. Ideker, Network-based classification of breast cancer metastasis., Mol. Syst. Biol. 3 (2007) 140. doi:10.1038/msb4100180. [10] T. Maier, M. Güell, L. Serrano, Correlation of mRNA and protein in complex biological samples, FEBS Lett. 583 (2009) 3966–3973. doi:10.1016/j.febslet.2009.10.036. [11] M. Uhlén, L. Fagerberg, B.M. Hallström, C. Lindskog, P. Oksvold, A. Mardinoglu, et al., Proteomics. Tissue-based map of the human proteome, Science. 347 (2015) 1260419. doi:10.1126/science.1260419. [12] C. Kiel, A. Vogt, A. Campagna, A. Chatr-aryamontri, M. Swiatek-de Lange, M. Beer, et al., Structural and functional protein network analyses predict novel signaling functions for rhodopsin, Mol. Syst. Biol. 7 (2011) 551. doi:10.1038/msb.2011.83. [13] S.J. Orr, D.R. Boutz, R. Wang, C. Chronis, N.C. Lea, T. Thayaparan, et al., Proteomic and protein interaction network analysis of human T lymphocytes during cell-cycle entry, Mol. Syst. Biol. 8 (2012) 573. doi:10.1038/msb.2012.5. [14] J. Cox, M. Mann, Quantitative, high-resolution proteomics for data-driven systems biology., Annu. Rev. Biochem. 80 (2011) 273–99. [15] T. Geiger, S.F. Madden, W.M. Gallagher, J. Cox, M. Mann, Proteomic portrait of human breast cancer progression identifies novel prognostic markers., Cancer Res. 72 (2012) 2428–2439. doi:10.1158/0008-5472.CAN-11-3711. [16] B. Zhang, J. Wang, X. Wang, J. Zhu, Q. Liu, Z. Shi, et al., Proteogenomic characterization of human colon and rectal cancer, Nature. 513 (2014) 382– 387. doi:10.1038/nature13438. [17] L.H. Hartwell, J.J. Hopfield, S. Leibler, A.W. Murray, From molecular to modular cell biology., Nature. 402 (1999) C47–52. doi:10.1038/35011540. [18] A.-L. Barabási, Z.N. Oltvai, Network biology: understanding the cell’s functional organization., Nat. Rev. Genet. 5 (2004) 101–113. [19] C. Pizzuti, S.E. Rombo, Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods, Bioinforma. Oxf. Engl. 30 (2014) 1343–1352. doi:10.1093/bioinformatics/btu034. [20] K.I. Goh, M.E. Cusick, D. Valle, B. Childs, M. Vidal, A.L. Barabasi, The human disease network, Proc Natl Acad Sci U A. 104 (2007) 8685–90. [21] J. Menche, A. Sharma, M. Kitsak, S.D. Ghiassian, M. Vidal, J. Loscalzo, et al., Disease networks. Uncovering disease-disease relationships through the incomplete interactome, Science. 347 (2015) 1257601. doi:10.1126/science.1257601. [22] E.E. Schadt, Molecular networks as sensors and drivers of common human diseases., Nature. 461 (2009) 218–23. doi:10.1038/nature08454. [23] A. Zanzoni, M. Soler-López, P. Aloy, A network medicine approach to human disease, FEBS Lett. 583 (2009) 1759–1765. doi:10.1016/j.febslet.2009.03.001. [24] A.-L. Barabási, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease, Nat. Rev. Genet. 12 (2011) 56–68. doi:10.1038/nrg2918. [25] N.J. Edwards, M. Oberti, R.R. Thangudu, S. Cai, P.B. McGarvey, S. Jacob, et al., The CPTAC Data Portal: A Resource for Cancer Proteomics Research, J. Proteome Res. (2015). doi:10.1021/pr501254j. [26] E. Becker, B. Robisson, C.E. Chapple, A. Guénoche, C. Brun, Multifunctional proteins revealed by overlapping clustering in protein interaction network, Bioinforma. Oxf. Engl. 28 (2012) 84–90. doi:10.1093/bioinformatics/btr621.
11
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[27] C.E. Chapple, B. Robisson, L. Spinelli, C. Guien, E. Becker, C. Brun, Extreme multifunctional proteins identified from a human protein interaction network, Nat. Commun. 6 (2015) 7412. doi:10.1038/ncomms8412. [28] C. Prieto, J. De Las Rivas, APID: Agile Protein Interaction DataAnalyzer, Nucleic Acids Res. 34 (2006) W298–302. doi:10.1093/nar/gkl128. [29] C. Stark, B.-J. Breitkreutz, A. Chatr-Aryamontri, L. Boucher, R. Oughtred, M.S. Livstone, et al., The BioGRID Interaction Database: 2011 update, Nucleic Acids Res. 39 (2011) D698–704. doi:10.1093/nar/gkq1116. [30] L. Salwinski, C.S. Miller, A.J. Smith, F.K. Pettit, J.U. Bowie, D. Eisenberg, The Database of Interacting Proteins: 2004 update., Nucleic Acids Res. 32 (2004) D449–51. doi:10.1093/nar/gkh086. [31] S. Kerrien, B. Aranda, L. Breuza, A. Bridge, F. Broackes-Carter, C. Chen, et al., The IntAct molecular interaction database in 2012, Nucleic Acids Res. 40 (2012) D841–846. doi:10.1093/nar/gkr1088. [32] K. Breuer, A.K. Foroushani, M.R. Laird, C. Chen, A. Sribnaia, R. Lo, et al., InnateDB: systems biology of innate immunity and beyond--recent updates and continuing curation, Nucleic Acids Res. 41 (2013) D1228–1233. doi:10.1093/nar/gks1147. [33] E. Chautard, M. Fatoux-Ardore, L. Ballut, N. Thierry-Mieg, S. Ricard-Blum, MatrixDB, the extracellular matrix interaction database, Nucleic Acids Res. 39 (2011) D235–240. doi:10.1093/nar/gkq830. [34] L. Licata, L. Briganti, D. Peluso, L. Perfetto, M. Iannuccelli, E. Galeota, et al., MINT, the molecular interaction database: 2012 update, Nucleic Acids Res. 40 (2012) D857–861. doi:10.1093/nar/gkr930. [35] R. Elkon, R. Vesterman, N. Amit, I. Ulitsky, I. Zohar, M. Weisz, et al., SPIKE--a database, visualization and analysis tool of cellular signaling pathways, BMC Bioinformatics. 9 (2008) 110. doi:10.1186/1471-2105-9-110. [36] P.F. Lange, C.M. Overall, TopFIND, a knowledgebase linking protein termini with function, Nat. Methods. 8 (2011) 703–704. doi:10.1038/nmeth.1669. [37] B. Aranda, H. Blankenburg, S. Kerrien, F.S.L. Brinkman, A. Ceol, E. Chautard, et al., PSICQUIC and PSISCORE: accessing and scoring molecular interactions, Nat. Methods. 8 (2011) 528–529. doi:10.1038/nmeth.1637. [38] J.-F. Rual, K. Venkatesan, T. Hao, T. Hirozane-Kishikawa, A. Dricot, N. Li, et al., Towards a proteome-scale map of the human protein-protein interaction network, Nature. 437 (2005) 1173–1178. doi:10.1038/nature04209. [39] R. Mosca, A. Céol, P. Aloy, Interactome3D: adding structural details to protein networks, Nat. Methods. 10 (2013) 47–53. doi:10.1038/nmeth.2289. [40] L. Fu, B. Niu, Z. Zhu, S. Wu, W. Li, CD-HIT: accelerated for clustering the nextgeneration sequencing data, Bioinforma. Oxf. Engl. 28 (2012) 3150–3152. doi:10.1093/bioinformatics/bts565. [41] The Gene Ontology Consortium, The Gene Ontology in 2010: extensions and refinements., Nucleic Acids Res. 38 (2010) D331–5. doi:10.1093/nar/gkp1018. [42] D. Barrell, E. Dimmer, R.P. Huntley, D. Binns, C. O’Donovan, R. Apweiler, The GOA database in 2009–an integrated Gene Ontology Annotation resource., Nucleic Acids Res. 37 (2009) D396–403. doi:10.1093/nar/gkn803. [43] M. Kanehisa, S. Goto, Y. Sato, M. Furumichi, M. Tanabe, KEGG for integration and interpretation of large-scale molecular data sets., Nucleic Acids Res. 40 (2012) D109–114. doi:10.1093/nar/gkr988. [44] D. Croft, G. O’Kelly, G. Wu, R. Haw, M. Gillespie, L. Matthews, et al., Reactome: a database of reactions, pathways and biological processes, Nucleic Acids Res. 39 (2011) D691–697. doi:10.1093/nar/gkq1018. [45] C.F. Schaefer, K. Anthony, S. Krupa, J. Buchoff, M. Day, T. Hannay, et al., PID: the Pathway Interaction Database, Nucleic Acids Res. 37 (2009) D674–679. doi:10.1093/nar/gkn653.
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
[46] A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, et al., Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles, Proc. Natl. Acad. Sci. U. S. A. 102 (2005) 15545–15550. doi:10.1073/pnas.0506580102. [47] D. Hanahan, R.A. Weinberg, Hallmarks of Cancer: The Next Generation, Cell. 144 (2011) 646–674. doi:10.1016/j.cell.2011.02.013. [48] J.-P. Pradere, D.H. Dapito, R.F. Schwabe, The Yin and Yang of Toll-like receptors in cancer, Oncogene. 33 (2014) 3485–3495. doi:10.1038/onc.2013.302. [49] A. Mani, E.P. Gelmann, The Ubiquitin-Proteasome Pathway and Its Role in Cancer, J. Clin. Oncol. 23 (2005) 4776–4789. doi:10.1200/JCO.2005.05.081. [50] L.J. Crawford, B. Walker, A.E. Irvine, Proteasome inhibitors in cancer therapy, J. Cell Commun. Signal. 5 (2011) 101–110. doi:10.1007/s12079-011-0121-7. [51] N. Rastogi, D.P. Mishra, Therapeutic targeting of cancer cell cycle using proteasome inhibitors, Cell Div. 7 (2012) 26. doi:10.1186/1747-1028-7-26. [52] M. Shen, S. Schmitt, D. Buac, Q.P. Dou, Targeting the ubiquitin-proteasome system for cancer therapy, Expert Opin. Ther. Targets. 17 (2013) 1091–1108. doi:10.1517/14728222.2013.815728. [53] C. Lagadec, E. Vlashi, L. Della Donna, Y. Meng, C. Dekmezian, K. Kim, et al., Survival and self-renewing capacity of breast cancer initiating cells during fractionated radiation treatment, Breast Cancer Res. BCR. 12 (2010) R13. doi:10.1186/bcr2479. [54] J. Pan, Q. Zhang, Y. Wang, M. You, 26S Proteasome Activity Is Down-Regulated in Lung Cancer Stem-Like Cells Propagated In Vitro, PLoS ONE. 5 (2010) e13298. doi:10.1371/journal.pone.0013298. [55] E. Vlashi, C. Lagadec, M. Chan, P. Frohnen, A.J. McDonald, F. Pajonk, Targeted elimination of breast cancer cells with low proteasome activity is sufficient for tumor regression, Breast Cancer Res. Treat. 141 (2013) 197–203. doi:10.1007/s10549-013-2688-6. [56] K. Munakata, M. Uemura, J. Nishimura, T. Hata, I. Takemasa, T. Mizushima, et al., Abstract 858: Treatment resistance of colon cancer with low proteasome activity, Cancer Res. 74 (2014) 858–858. doi:10.1158/1538-7445.AM2014-858. [57] M. Uemura, K. Munakata, J. Nishimura, T. Hata, I. Takemasa, T. Mizushima, et al., Abstract 1401: Low proteasome activity and cancer stemness in colorectal cancer, Cancer Res. 75 (2015) 1401–1401. doi:10.1158/1538-7445.AM20151401. [58] C. Lagadec, E. Vlashi, S. Bhuta, C. Lai, P. Mischel, M. Werner, et al., Tumor cells with low proteasome subunit expression predict overall survival in head and neck cancer patients, BMC Cancer. 14 (2014) 152. doi:10.1186/1471-240714-152. [59] P. Shannon, A. Markiel, O. Ozier, N.S. Baliga, J.T. Wang, D. Ramage, et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks, Genome Res. 13 (2003) 2498–2504. doi:10.1101/gr.1239303.
46
Figure Captions
47 48
Figure 1. Workflow of the proposed computational approach that combines
49
quantitative proteomics data with protein interaction analysis to identify significantly
50
dysregulated cellular functions during cancer progression.
51
13
1
Figure 2. Results summary of the statistical analysis to detect significantly
2
dysregulated modules during breast (upper panel) and colorectal (lower panel)
3
cancer progression.
4 5
Figure 3. The network module 780 shows a significantly decreasing trend of
6
expression in both BC and CRC. (A) Expression profile of module 780 during BC
7
progression. Proteins with expression data are depicted as circle, whereas proteins
8
for which expression data is missing are represented as grey diamonds. Positive and
9
negative expression Z-scores are reported as shades of violet and orange
10
respectively. (B) Expression profile of module 780 during CRC progression. Color
11
coding as in (A). Network module representations were generated using Cytoscape
12
[59].
13 14
Appendix A. Supplementary Data
15 16
Supplementary Data A. Supplementary methods for the data pre-processing of the
17
proteomic profiling experiments used in this study.
18 19
Supplementary Data B. Dysregulated modules with the strongest significant trend.
20 21
Supplementary Tables. The human interactome used in this study; the network
22
modules detected by the OCG algorithm; dysregulated modules in breast and
23
colorectal cancer datasets; functional annotations of dysregulated network modules.
24 25
Supplementary Figure 1. Significantly dysregulated network modules with an
26
increasing trend in expression in BC.
27 28
Supplementary Figure 2. Significantly dysregulated network modules with a
29
decreasing trend in expression in BC.
30 31
Supplementary Figure 3. Significantly dysregulated network modules with no
32
significant trend in expression in BC.
33 34
Supplementary Figure 4. Significantly dysregulated network modules with an
35
increasing trend in expression in CRC.
36
14
1
Supplementary Figure 5. Significantly dysregulated network modules with a
2
decreasing trend in expression in CRC.
3 4
Supplementary Figure 6. Significantly dysregulated network modules with no
5
significant trend in expression in CRC.
6
15
1 2 3 4 5 6 7 8 9 10
•
We propose a network-based method to study cancer progression using proteomics data
•
We found network modules showing distinct dysregulation trends during progression
•
Our approach can be applied to any stage-based or time-resolved proteomics dataset
16