Integration of quantitative proteomics data and interaction networks: Identification of dysregulated cellular functions during cancer progression

Integration of quantitative proteomics data and interaction networks: Identification of dysregulated cellular functions during cancer progression

Accepted Manuscript Integration of quantitative proteomics data and interaction networks: identification of dysregulated cellular functions during can...

507KB Sizes 1 Downloads 18 Views

Accepted Manuscript Integration of quantitative proteomics data and interaction networks: identification of dysregulated cellular functions during cancer progression Andreas Zanzoni, Christine Brun PII: DOI: Reference:

S1046-2023(15)30089-X http://dx.doi.org/10.1016/j.ymeth.2015.09.014 YMETH 3795

To appear in:

Methods

Received Date: Revised Date: Accepted Date:

14 June 2015 2 September 2015 14 September 2015

Please cite this article as: A. Zanzoni, C. Brun, Integration of quantitative proteomics data and interaction networks: identification of dysregulated cellular functions during cancer progression, Methods (2015), doi: http://dx.doi.org/ 10.1016/j.ymeth.2015.09.014

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

Integration of quantitative proteomics data and interaction

2

networks: identification of dysregulated cellular functions

3

during cancer progression

4 Andreas Zanzoni1,2 and Christine Brun1,2,3,*

5 6

1

7

UMR_S1090 TAGC, Marseille, F-13288, France, CNRS, Marseille, F-13402, France.

Inserm, UMR_S1090 TAGC, Marseille, F-13288, France,

2

Aix-Marseille Université,

3

8 9

* Correspondence: Dr. Christine Brun, TAGC UMR_S1090, Inserm, Aix-Marseille Université,

10

Marseille, France. Tel: + 33 4 91 82 87 12. Fax: +33 4 91 82 87 01. E-mail: [email protected]

11

mrs.fr.

12 13 14 15 16 17 18 19 20 21 22 23 24

1 2

Abstract

3

Quantitative proteomics allows the characterization of molecular changes between

4

healthy and disease states. To interpret such datasets, their integration to the

5

protein-protein interaction network provides a more comprehensive understanding of

6

cellular function dysregulation in diseases than just considering lists of dysregulated

7

proteins. Here, we propose a novel computational method, which combines protein

8

interaction network and statistical analyses to establish expression profiles at the

9

network module level rather than at the individual protein level, and to detect and

10

characterize dysregulated network modules through different stages of cancer

11

progression. We applied our approach to two publicly available datasets as case

12

studies.

13 14

Keywords: bioinformatics, quantitative proteomics, protein interaction, network

15

modules, cancer progression

16 17

1. Introduction

18 19

High-throughput technologies such as gene expression profiling have permitted the

20

characterization of molecular changes between healthy and disease states. They

21

have led to the identification of distinct phenotypic classes and stages, crucial for

22

therapeutic intervention [1–5]. Notwithstanding these, the applicability in clinical

23

practice of the resulting plethora of prognostic signatures is still limited [6].

24 25

To identify molecular perturbations that might be implicated in disease, integrating

26

protein interaction with omics data such as gene expression profiles can help. For

27

this, several computational methods have been developed [7], some of which have

28

been applied in cancer biology to discover novel genes related to B-cell lymphomas

29

[8] and to improve breast cancer classification [9]. However, these approaches rely

30

on the assumption that there is a correlation between the expression of a gene and

31

its corresponding protein(s), although many mechanisms may uncouple transcription

32

from translation [10]. Thus, considering protein levels is more appropriate since they

33

reflect the cell phenotype more precisely [11]. Indeed, recent reports illustrate the

34

benefit of integrating protein expression and interaction data to predict, for instance,

35

novel functions for the G protein‐coupled receptor rhodopsin in photoreceptor cells

36

[12] or to identify the cellular processes involved in cell cycle entry in human T

2

1

lymphocytes [13]. Furthermore, current quantitative proteomics technologies enable

2

the identification and the quantification of thousands of proteins and are paving the

3

way to a systems-wide analysis of proteomes [14]. This is particularly relevant for the

4

study of human diseases such as cancer, in which cell proteomes are reshaped

5

during disease progression leading to or resulting from the dysregulation of cellular

6

processes. As the proteomic landscapes of cancer cell lines and tumor samples at

7

different stages are becoming available [15,16], it is thus crucial to develop

8

computational network-based approaches to analyze high-resolution cancer

9

proteomics data.

10 11

To this aim, we present in this work a computational network-based framework to

12

identify significantly dysregulated cellular functions during disease progression by

13

combining quantitative proteomics data with protein interaction analysis. Proteins

14

seldom act alone but rather perform their functions by interacting with other proteins

15

in macromolecular complexes, metabolic and signaling pathways [17,18].

16

Considering these cellular functional units rather than the proteins alone should led

17

to a better grasp on the expression dysregulation leading to disease. We therefore

18

propose a “functional modules” approach as these provide a formal framework to

19

investigate and infer protein cellular functions as well as their possible perturbations.

20

Indeed, functional modules can be identified from protein interaction networks using

21

tailored computational methods (reviewed in [19]). Based on a guilt-by-association

22

principle, they allow the cellular function inference of the uncharacterized protein

23

components. Furthermore, they can contain several interacting disease-related

24

proteins [20,21] and alterations in one or more of these players may propagate to

25

other components, eventually leading to cell network perturbations and disease

26

phenotypes [22–24]. Integrating protein expression at this level of organization of the

27

protein network thus appears particularly relevant for a better understanding of

28

cellular function dysregulation in diseases.

29 30

Here, we propose a novel computational method, which combines protein interaction

31

network and statistical analysis to interpret quantitative proteomics data. Overall, the

32

strength of our approach is four-fold: (i) it provides an integrated cellular context by

33

taking into account the expression profile at the network module level; (ii) it exploits

34

protein expression values, which better describe cell phenotypes; (iii) it detects

35

statistically dysregulated network modules through different stages of cancer

36

progression; (iv) based on their participation in these modules, those members for

37

which no proteomic data is available, represent candidate cancer proteins, possibly

3

1

implicated in cancer progression, and representing potential biomarkers or

2

therapeutic targets. We applied our approach to two public available datasets as

3

case studies: a panel of 11 cell lines recapitulating breast cancer progression [15]

4

and a set of 155 normal and tumor colorectal samples taken from the Clinical

5

Proteomic Tumor Analysis Consortium (CPTAC) data portal [25].

6 7

2. Materials and Methods

8 9

2.1 Overview

10

Our approach consists in the following steps (Figure 1): (i) generation of a human

11

binary protein interaction network by gathering publicly available data; (ii) detection of

12

network modules using the OCG (Overlapping Cluster Generator) algorithm [26]; (iii)

13

functional annotation of the network modules; (iv) integration of quantitative

14

proteomics data; (v) identification of network modules showing a statistically

15

significant change in protein expression during progression, using non-parametric

16

analysis of variance methods. Each step is described in details in the following sub-

17

sections.

18 19

2.2 Interactome construction

20

We used the human interactome that we built and described previously [27]. Briefly,

21

protein interaction data have been retrieved (February 2013) from several databases

22

[28–36] through the PSICQUIC query interface [37]. Only binary interactions likely to

23

be direct according to the experimental detection method [38,39] have been kept. To

24

reduce the redundancy among TrEMBL and SwissProt protein entries, the

25

sequences have been clustered using CD-HIT [40] and TrEMBL/SwissProt protein

26

pairs sharing at least 95% sequence similarity were considered to be the same

27

protein. Interactions assigned to the TrEMBL entry were then transferred to the

28

SwissProt entry. In this way, a human binary interactome containing 74,388

29

interactions between 12,865 proteins have been obtained (Supplementary Table S1).

30 31

2.3 Network module identification and annotation

32

We identified network modules using the OCG algorithm with default parameters [26]

33

obtaining a collection of 855 modules ranging in size from 2 to 306 proteins

34

(Supplementary Table S2). This algorithm decomposes a network into overlapping

35

modules, based on modularity optimization [26].

4

1

We functionally annotated these network modules by assessing the over-

2

representation of Gene Ontology (GO) biological process terms [41] (downloaded

3

from the GOA website [42], April 2013) and cellular pathways taken from KEGG [43],

4

Reactome [44] and NCI-PID [45] databases (downloaded from the MSigDB [46], April

5

2013). We considered only GO terms and pathways having at least 5, and no more

6

than 500, annotated proteins in the human proteome. Enrichment P-values were

7

computed using the Fisher’s exact test (one-sided) and corrected for multiple testing

8

with the Benjamini-Hochberg procedure (significance threshold alpha=1x10-3). As

9

background reference, we used the annotated proteins in the human interactome.

10

Finally, to maximize the annotation coverage, we further characterized network

11

modules by assessing their functional homogeneity based on GO biological process

12

terms as described in [27]. Briefly, a GO term is assigned to a network module if at

13

least 50% of its proteins share that GO term.

14 15

2.4 Proteomics data processing and integration

16

Our approach takes as input protein expression data matrices (see Supplementary

17

Data A). Only proteins present in the human interactome are considered. To estimate

18

the expression across the different stages, we computed a Z-score of the expression

19

values for each protein i as follows:

20 , =

, − μ σ

21 22

where , is the expression value of the protein i in the stage j, Zi,j is the Z-score of

23

the protein i in the stage j, µ i and σi are the mean expression value and the standard

24

deviation of the protein i across the stages, respectively. Finally, the set of the Z-

25

scores of all individual proteins belonging to the same network modules are

26

assembled as an expression distribution for every stage, therefore generating an

27

expression profile for every module during progression (Figure 1).

28 29

2.5 Module dysregulation assessment

30

Network modules containing at least 5 proteins and having quantitative expression

31

for at least 50% of their components available were selected for statistical analysis.

32

We performed three non-parametric tests to assess module dysregulation. All the

33

tests were performed in the R statistical environment and obtained P-values were

34

corrected for multiple testing with the Benjamini-Hochberg procedure.

35

5

1

2.5.1 Identification of dysregulated modules

2

We first applied the Kruskal-Wallis (KW), a non-parametric analysis of variance

3

method, which compares for every module its expression distributions (one for each

4

stage of progression). The KW assesses whether at least one distribution

5

“dominates” the others, meaning that it differs significantly from the other

6

distributions. In this case, the given module was considered as significantly

7

dysregulated when the corrected P-value was smaller than 0.05 (two-sided KW test).

8 9

2.5.2 Dysregulated modules with increasing and decreasing expression trends

10

across stages

11

We used the one-sided Jonckheere's Trend (JT) method (R package: SAGx) to

12

detect dysregulated modules (among those found by the KW test) with significantly

13

increasing or decreasing expression. To do so, the JT test takes into account the

14

ordering of the stages, that is from the initial (e.g., normal) to the most advanced one

15

(e.g., metastasis). We considered a trend significant when the corrected P-value was

16

smaller than 0.025.

17 18

2.5.3 Stage-specific dysregulated modules during progression

19

To overcome the limitation of the KW test on the detection of dominant stage(s), if

20

any (the KW test does not pinpoint the significantly different distribution(s) per

21

profile), we applied the post-hoc Dunn Test (DT) (R package: dunn.test). The DT

22

method performs multiple pairwise comparisons between the expression distributions

23

of a given module. We defined a module as dysregulated in a stage-specific manner

24

if: (i) the expression distribution of a given stage is significantly greater compared to

25

at least 75% of the other stages by the DT test (corrected P-value<0.05); (ii) it does

26

not show any significant dysregulation trend detected by the JT test.

27 28

3. Results and Discussion

29 30

We have applied the proposed method to two quantitative proteomic datasets. First,

31

the proteomes of 11 cell lines recapitulating the ER(-) breast tumor transformation

32

process. These have been generated by SILAC-based proteomics profiling [15] and

33

contain 7,800 proteins. Second, the proteomic profiling of 3,899 genes across a set

34

of 155 normal and tumor colorectal samples taken from the Clinical Proteomic Tumor

35

Analysis Consortium (CPTAC) data portal [25].

36

6

1

3.1 Up to a half of the human network modules are dysregulated during cancer

2

progression

3

The breast cancer progression (BC) dataset covers 38% of the proteins contained in

4

the human interactome. This allowed us to analyze 414 network modules that fulfill

5

our selection criteria (Section 2.3), i.e., roughly 50% of the 819 interactome modules

6

formed by at least 5 proteins. According to the Kruskal-Wallis test, we found that 138

7

modules, containing a total of 2985 proteins, were significantly dysregulated (Figure

8

2, Supplementary Table S3) across cancer stages. These modules represent 33% of

9

the 414 analyzed modules and 17% of all 819 interactome modules.

10 11

Similarly, using the colorectal cancer (CRC) proteomics dataset (that covers 29% of

12

the interactome proteins), we analyzed 157 network modules complying with the

13

criteria described in Section 2.3 (i.e., 19% of the interactome module). Seventy-

14

seven of them (accounting for 49% and 9,5% of the studied and of all interactome

15

modules, respectively) were found significantly dysregulated according to the KW

16

test (Figure 2, Supplementary Table S4).

17 18

Overall, these results show that for at least 33 to 49% of the investigated functional

19

modules, the global protein expression differs significantly at one or several cancer

20

progression stages, compared to the others. From a functional perspective, in both

21

datasets, these dysregulated modules are mainly involved in signaling, cell cycle and

22

transcriptional regulation (Supplementary Table S5 and S6).

23 24

The fact that a higher proportion of dysregulated modules are detected when using

25

CRC compared to BC data (49% vs. 33%) possibly reflects the heterogeneity of

26

tumor samples compared to cell lines. However, we cannot exclude that this

27

difference is due to the extent of the interactome coverage by the proteomic dataset.

28 29

3.2 The increasing and the decreasing expression tendency of dysregulated

30

modules.

31

In the BC dataset, by applying the Jonckheere's Trend test to the KW dysregulated

32

modules, we detected a significant increasing expression for 71 modules (51%)

33

across the stages, from normal to metastatic, whereas 18 (13%) showed a

34

decreasing expression during progression. The remaining 49 dysregulated modules

35

did not show any significant dysregulation trend. Whereas the modules with

36

increasing expression are involved in sustained proliferation such as cell cycle, DNA

37

replication, recombination and repair, mRNA splicing, transcription regulation, and

7

1

several signaling pathways (e.g., events mediated by Notch, SMAD2/3 and P53), the

2

modules with the opposite trend reveal a decrease in the expression of proteins

3

acting in focal adhesion, regulation of actin cytoskeleton, extracellular matrix receptor

4

interactions and cell surface receptor- and EGFR signaling (Supplementary Data B.1

5

and B.2, Supplementary Table S3 and S5). The decreased expression of these

6

cellular functions involved in the maintenance of tissue integrity, is an hallmark of

7

cancer development leading to metastasis [47].

8 9

In the CRC dataset, the JT test detected for 19 (24,6%) and 5 (6,5%) modules a

10

significant increasing or decreasing expression, respectively. Here again, the

11

increasing trend is associated with cell proliferation functions such as mRNA splicing

12

and spliceosome, DNA repair and Notch and SMAD2/3 signaling. On the other hand,

13

modules displaying a decreasing trend are annotated with functions related to cell

14

cycle arrest, protein degradation via the proteasome, and antigen

15

processing/presentation, the latter being related to immune surveillance evasion

16

(Supplementary Data B.3 and B.4, Supplementary Table S4 and S6), an emerging

17

cancer hallmark [47].

18 19

Interestingly, 29 modules of the interactome are found dysregulated using both

20

proteomic datasets (Supplementary Table S3 and S4). Eighty percent of these

21

(23/29) show the same trend according to the JT test, among which 39% (i.e., 9/23)

22

reach statistical significance. These modules denote biological processes commonly

23

dysregulated in both cancers, essentially RNA splicing and DNA damage response

24

that are increasing with cancer progression, whereas protein degradation is

25

decreasing (see section 3.4). Analogously, the identification of modules specifically

26

dysregulated solely in one cancer could be highly informative; however, it might be

27

influenced by the difference between the proteomes’ coverage and may be due to a

28

lack of data for the proteins of a module in one of the two datasets.

29 30

3.3 Two types of stage-specific modules

31

We have identified the specific stage(s) showing the most significant changes in

32

expression among dysregulated modules without any trend using the Dunn test (see

33

Section 2.5.3). Twenty-six percent and 47% showed a significant higher expression

34

in one or two specific stages for BC and CRC respectively (Figure 2, Supplementary

35

Table S3 and S4). Notably, the highest significant expression is detected in Stage II

36

for the majority of those modules (9/13, i.e., 69%, and 23/25, i.e., 92%, for BC and

37

CRC, respectively). This corresponds to an expression burst, which is not maintained

8

1

through the following stages in most of the cases (for instance, see modules 319 and

2

386 in Supplementary Figure S6). Modules in this category are involved in signaling,

3

regulation of actin cytoskeleton, focal adhesion and extracellular matrix receptor

4

interaction.

5 6

Conversely, a significantly lower expression in one or two stages is revealed in

7

32,6% and 39,6% of the dysregulated modules for BC and CRC, respectively (Figure

8

2, Supplementary Table). Very interestingly, for most of the dysregulated modules,

9

the normal stage is the lowest expressed one, whereas the other stages have higher

10

but similar and even expression distributions (see for example, Modules 201 and 309

11

in Supplementary Figure S6). This corresponds to a sharp increase in expression in

12

these modules in the premalignant stage, which is further maintained through cancer

13

progression. These modules are dedicated to mRNA splicing, DNA replication and

14

particularly to the toll-like receptor pathways, which are related to tumor-promoting

15

inflammation [48].

16 17

Overall, using the Dunn test allowed us to distinguish cellular functions dysregulated

18

in one particular stage from those dysregulated as soon as Stage I, a fact persisting

19

throughout cancer progression.

20 21

3.4 An example of dysregulated module in both BC and CRC

22

In the colorectal dataset, 4 network modules related to the proteasome show a

23

significantly decreasing trend during progression (see Supplementary Table S4).

24

One of these (i.e. Module 780) shows the same significant trend also in breast

25

cancer (Figure 3). Cancer cells exploit the ubiquitin-proteasome systems for their

26

growth [49]. For this reason, specific inhibitors have been developed to target the

27

proteasome machinery in different cancer types [50–52]. However, reports

28

highlighted that low proteasome activity (i) is a distinct feature of cancer cells with a

29

high self-renewal capacity and a stem-like phenotype [53–57] and (ii) has been

30

associated with decreased survival in head and neck cancer patients treated with

31

radiotherapy [58]. Our results are consistent with this scenario: tumor samples in the

32

CRC dataset come from patients that received radiotherapy treatment [25] and low

33

proteasome activity has been recently observed in sub-population of certain breast

34

cancer cell lines [55].

35 36 37

9

1

4. Conclusions

2 3

We have proposed a novel computational method for integrating quantitative

4

proteomics and protein interaction data, based on network clustering and statistical

5

analyses. We have shown that our approach is able to identify relevant dysregulated

6

cellular functions involved in cancer progression. The method is modular in its

7

conception and can be adapted to any network. Furthermore, it does not depend on

8

the network clustering algorithm used to identify functional modules. More

9

importantly, the method is generic because any proteomic dataset measuring

10

changes during the progression of a phenomenon of interest, like disease or cancer

11

progression or infection kinetics can be studied. In particular, proteomic data

12

obtained from a single patient over time could also be analyzed with our approach in

13

a personalized medicine setting.

14 15 16

Acknowledgements

17

The authors received financial support from the French “Plan Cancer 2009-2013”

18

(Systems Biology call, A12171AS). The authors thank Anaïs Baudot and Elisabeth

19

Remy (I2M, CNRS, Marseille), Lionel Spinelli (TAGC, Aix-Marseille University,

20

Marseille), Luc Camoin (CRCM, Inserm, Marseille) and all the partners of the

21

Hsp27BioSys project for fruitful discussion.

22 23

References

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

[1] T. Sorlie, C.M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, et al., Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications., Proc. Natl. Acad. Sci. U. S. A. 98 (2001) 10869–74. [2] T. Sorlie, R. Tibshirani, J. Parker, T. Hastie, J.S. Marron, A. Nobel, et al., Repeated observation of breast tumor subtypes in independent gene expression data sets., Proc. Natl. Acad. Sci. U. S. A. 100 (2003) 8418–23. [3] J. Lapointe, C. Li, J.P. Higgins, M. van de Rijn, E. Bair, K. Montgomery, et al., Gene expression profiling identifies clinically relevant subtypes of prostate cancer., Proc. Natl. Acad. Sci. U. S. A. 101 (2004) 811–6. doi:10.1073/pnas.0304146101. [4] Y. Wang, J.G.M. Klijn, Y. Zhang, A.M. Sieuwerts, M.P. Look, F. Yang, et al., Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer., Lancet. 365 (2005) 671–9. [5] A. Naderi, A.E. Teschendorff, N.L. Barbosa-Morais, S.E. Pinder, A.R. Green, D.G. Powe, et al., A gene-expression signature to predict survival in breast cancer across independent data sets., Oncogene. 26 (2007) 1507–16. [6] T. Iwamoto, L. Pusztai, Predicting prognosis of breast cancer with gene signatures: are we lost in a sea of data?, Genome Med. 2 (2010) 81. [7] L.I. Furlong, Human diseases through the lens of network biology., Trends Genet. TIG. 29 (2013) 150–9. doi:10.1016/j.tig.2012.11.004.

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

[8] K.M. Mani, C. Lefebvre, K. Wang, W.K. Lim, K. Basso, R. Dalla-Favera, et al., A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas., Mol. Syst. Biol. 4 (2008) 169. doi:10.1038/msb.2008.2. [9] H.-Y. Chuang, E. Lee, Y.-T. Liu, D. Lee, T. Ideker, Network-based classification of breast cancer metastasis., Mol. Syst. Biol. 3 (2007) 140. doi:10.1038/msb4100180. [10] T. Maier, M. Güell, L. Serrano, Correlation of mRNA and protein in complex biological samples, FEBS Lett. 583 (2009) 3966–3973. doi:10.1016/j.febslet.2009.10.036. [11] M. Uhlén, L. Fagerberg, B.M. Hallström, C. Lindskog, P. Oksvold, A. Mardinoglu, et al., Proteomics. Tissue-based map of the human proteome, Science. 347 (2015) 1260419. doi:10.1126/science.1260419. [12] C. Kiel, A. Vogt, A. Campagna, A. Chatr-aryamontri, M. Swiatek-de Lange, M. Beer, et al., Structural and functional protein network analyses predict novel signaling functions for rhodopsin, Mol. Syst. Biol. 7 (2011) 551. doi:10.1038/msb.2011.83. [13] S.J. Orr, D.R. Boutz, R. Wang, C. Chronis, N.C. Lea, T. Thayaparan, et al., Proteomic and protein interaction network analysis of human T lymphocytes during cell-cycle entry, Mol. Syst. Biol. 8 (2012) 573. doi:10.1038/msb.2012.5. [14] J. Cox, M. Mann, Quantitative, high-resolution proteomics for data-driven systems biology., Annu. Rev. Biochem. 80 (2011) 273–99. [15] T. Geiger, S.F. Madden, W.M. Gallagher, J. Cox, M. Mann, Proteomic portrait of human breast cancer progression identifies novel prognostic markers., Cancer Res. 72 (2012) 2428–2439. doi:10.1158/0008-5472.CAN-11-3711. [16] B. Zhang, J. Wang, X. Wang, J. Zhu, Q. Liu, Z. Shi, et al., Proteogenomic characterization of human colon and rectal cancer, Nature. 513 (2014) 382– 387. doi:10.1038/nature13438. [17] L.H. Hartwell, J.J. Hopfield, S. Leibler, A.W. Murray, From molecular to modular cell biology., Nature. 402 (1999) C47–52. doi:10.1038/35011540. [18] A.-L. Barabási, Z.N. Oltvai, Network biology: understanding the cell’s functional organization., Nat. Rev. Genet. 5 (2004) 101–113. [19] C. Pizzuti, S.E. Rombo, Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods, Bioinforma. Oxf. Engl. 30 (2014) 1343–1352. doi:10.1093/bioinformatics/btu034. [20] K.I. Goh, M.E. Cusick, D. Valle, B. Childs, M. Vidal, A.L. Barabasi, The human disease network, Proc Natl Acad Sci U A. 104 (2007) 8685–90. [21] J. Menche, A. Sharma, M. Kitsak, S.D. Ghiassian, M. Vidal, J. Loscalzo, et al., Disease networks. Uncovering disease-disease relationships through the incomplete interactome, Science. 347 (2015) 1257601. doi:10.1126/science.1257601. [22] E.E. Schadt, Molecular networks as sensors and drivers of common human diseases., Nature. 461 (2009) 218–23. doi:10.1038/nature08454. [23] A. Zanzoni, M. Soler-López, P. Aloy, A network medicine approach to human disease, FEBS Lett. 583 (2009) 1759–1765. doi:10.1016/j.febslet.2009.03.001. [24] A.-L. Barabási, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease, Nat. Rev. Genet. 12 (2011) 56–68. doi:10.1038/nrg2918. [25] N.J. Edwards, M. Oberti, R.R. Thangudu, S. Cai, P.B. McGarvey, S. Jacob, et al., The CPTAC Data Portal: A Resource for Cancer Proteomics Research, J. Proteome Res. (2015). doi:10.1021/pr501254j. [26] E. Becker, B. Robisson, C.E. Chapple, A. Guénoche, C. Brun, Multifunctional proteins revealed by overlapping clustering in protein interaction network, Bioinforma. Oxf. Engl. 28 (2012) 84–90. doi:10.1093/bioinformatics/btr621.

11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

[27] C.E. Chapple, B. Robisson, L. Spinelli, C. Guien, E. Becker, C. Brun, Extreme multifunctional proteins identified from a human protein interaction network, Nat. Commun. 6 (2015) 7412. doi:10.1038/ncomms8412. [28] C. Prieto, J. De Las Rivas, APID: Agile Protein Interaction DataAnalyzer, Nucleic Acids Res. 34 (2006) W298–302. doi:10.1093/nar/gkl128. [29] C. Stark, B.-J. Breitkreutz, A. Chatr-Aryamontri, L. Boucher, R. Oughtred, M.S. Livstone, et al., The BioGRID Interaction Database: 2011 update, Nucleic Acids Res. 39 (2011) D698–704. doi:10.1093/nar/gkq1116. [30] L. Salwinski, C.S. Miller, A.J. Smith, F.K. Pettit, J.U. Bowie, D. Eisenberg, The Database of Interacting Proteins: 2004 update., Nucleic Acids Res. 32 (2004) D449–51. doi:10.1093/nar/gkh086. [31] S. Kerrien, B. Aranda, L. Breuza, A. Bridge, F. Broackes-Carter, C. Chen, et al., The IntAct molecular interaction database in 2012, Nucleic Acids Res. 40 (2012) D841–846. doi:10.1093/nar/gkr1088. [32] K. Breuer, A.K. Foroushani, M.R. Laird, C. Chen, A. Sribnaia, R. Lo, et al., InnateDB: systems biology of innate immunity and beyond--recent updates and continuing curation, Nucleic Acids Res. 41 (2013) D1228–1233. doi:10.1093/nar/gks1147. [33] E. Chautard, M. Fatoux-Ardore, L. Ballut, N. Thierry-Mieg, S. Ricard-Blum, MatrixDB, the extracellular matrix interaction database, Nucleic Acids Res. 39 (2011) D235–240. doi:10.1093/nar/gkq830. [34] L. Licata, L. Briganti, D. Peluso, L. Perfetto, M. Iannuccelli, E. Galeota, et al., MINT, the molecular interaction database: 2012 update, Nucleic Acids Res. 40 (2012) D857–861. doi:10.1093/nar/gkr930. [35] R. Elkon, R. Vesterman, N. Amit, I. Ulitsky, I. Zohar, M. Weisz, et al., SPIKE--a database, visualization and analysis tool of cellular signaling pathways, BMC Bioinformatics. 9 (2008) 110. doi:10.1186/1471-2105-9-110. [36] P.F. Lange, C.M. Overall, TopFIND, a knowledgebase linking protein termini with function, Nat. Methods. 8 (2011) 703–704. doi:10.1038/nmeth.1669. [37] B. Aranda, H. Blankenburg, S. Kerrien, F.S.L. Brinkman, A. Ceol, E. Chautard, et al., PSICQUIC and PSISCORE: accessing and scoring molecular interactions, Nat. Methods. 8 (2011) 528–529. doi:10.1038/nmeth.1637. [38] J.-F. Rual, K. Venkatesan, T. Hao, T. Hirozane-Kishikawa, A. Dricot, N. Li, et al., Towards a proteome-scale map of the human protein-protein interaction network, Nature. 437 (2005) 1173–1178. doi:10.1038/nature04209. [39] R. Mosca, A. Céol, P. Aloy, Interactome3D: adding structural details to protein networks, Nat. Methods. 10 (2013) 47–53. doi:10.1038/nmeth.2289. [40] L. Fu, B. Niu, Z. Zhu, S. Wu, W. Li, CD-HIT: accelerated for clustering the nextgeneration sequencing data, Bioinforma. Oxf. Engl. 28 (2012) 3150–3152. doi:10.1093/bioinformatics/bts565. [41] The Gene Ontology Consortium, The Gene Ontology in 2010: extensions and refinements., Nucleic Acids Res. 38 (2010) D331–5. doi:10.1093/nar/gkp1018. [42] D. Barrell, E. Dimmer, R.P. Huntley, D. Binns, C. O’Donovan, R. Apweiler, The GOA database in 2009–an integrated Gene Ontology Annotation resource., Nucleic Acids Res. 37 (2009) D396–403. doi:10.1093/nar/gkn803. [43] M. Kanehisa, S. Goto, Y. Sato, M. Furumichi, M. Tanabe, KEGG for integration and interpretation of large-scale molecular data sets., Nucleic Acids Res. 40 (2012) D109–114. doi:10.1093/nar/gkr988. [44] D. Croft, G. O’Kelly, G. Wu, R. Haw, M. Gillespie, L. Matthews, et al., Reactome: a database of reactions, pathways and biological processes, Nucleic Acids Res. 39 (2011) D691–697. doi:10.1093/nar/gkq1018. [45] C.F. Schaefer, K. Anthony, S. Krupa, J. Buchoff, M. Day, T. Hannay, et al., PID: the Pathway Interaction Database, Nucleic Acids Res. 37 (2009) D674–679. doi:10.1093/nar/gkn653.

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

[46] A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, et al., Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles, Proc. Natl. Acad. Sci. U. S. A. 102 (2005) 15545–15550. doi:10.1073/pnas.0506580102. [47] D. Hanahan, R.A. Weinberg, Hallmarks of Cancer: The Next Generation, Cell. 144 (2011) 646–674. doi:10.1016/j.cell.2011.02.013. [48] J.-P. Pradere, D.H. Dapito, R.F. Schwabe, The Yin and Yang of Toll-like receptors in cancer, Oncogene. 33 (2014) 3485–3495. doi:10.1038/onc.2013.302. [49] A. Mani, E.P. Gelmann, The Ubiquitin-Proteasome Pathway and Its Role in Cancer, J. Clin. Oncol. 23 (2005) 4776–4789. doi:10.1200/JCO.2005.05.081. [50] L.J. Crawford, B. Walker, A.E. Irvine, Proteasome inhibitors in cancer therapy, J. Cell Commun. Signal. 5 (2011) 101–110. doi:10.1007/s12079-011-0121-7. [51] N. Rastogi, D.P. Mishra, Therapeutic targeting of cancer cell cycle using proteasome inhibitors, Cell Div. 7 (2012) 26. doi:10.1186/1747-1028-7-26. [52] M. Shen, S. Schmitt, D. Buac, Q.P. Dou, Targeting the ubiquitin-proteasome system for cancer therapy, Expert Opin. Ther. Targets. 17 (2013) 1091–1108. doi:10.1517/14728222.2013.815728. [53] C. Lagadec, E. Vlashi, L. Della Donna, Y. Meng, C. Dekmezian, K. Kim, et al., Survival and self-renewing capacity of breast cancer initiating cells during fractionated radiation treatment, Breast Cancer Res. BCR. 12 (2010) R13. doi:10.1186/bcr2479. [54] J. Pan, Q. Zhang, Y. Wang, M. You, 26S Proteasome Activity Is Down-Regulated in Lung Cancer Stem-Like Cells Propagated In Vitro, PLoS ONE. 5 (2010) e13298. doi:10.1371/journal.pone.0013298. [55] E. Vlashi, C. Lagadec, M. Chan, P. Frohnen, A.J. McDonald, F. Pajonk, Targeted elimination of breast cancer cells with low proteasome activity is sufficient for tumor regression, Breast Cancer Res. Treat. 141 (2013) 197–203. doi:10.1007/s10549-013-2688-6. [56] K. Munakata, M. Uemura, J. Nishimura, T. Hata, I. Takemasa, T. Mizushima, et al., Abstract 858: Treatment resistance of colon cancer with low proteasome activity, Cancer Res. 74 (2014) 858–858. doi:10.1158/1538-7445.AM2014-858. [57] M. Uemura, K. Munakata, J. Nishimura, T. Hata, I. Takemasa, T. Mizushima, et al., Abstract 1401: Low proteasome activity and cancer stemness in colorectal cancer, Cancer Res. 75 (2015) 1401–1401. doi:10.1158/1538-7445.AM20151401. [58] C. Lagadec, E. Vlashi, S. Bhuta, C. Lai, P. Mischel, M. Werner, et al., Tumor cells with low proteasome subunit expression predict overall survival in head and neck cancer patients, BMC Cancer. 14 (2014) 152. doi:10.1186/1471-240714-152. [59] P. Shannon, A. Markiel, O. Ozier, N.S. Baliga, J.T. Wang, D. Ramage, et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks, Genome Res. 13 (2003) 2498–2504. doi:10.1101/gr.1239303.

46

Figure Captions

47 48

Figure 1. Workflow of the proposed computational approach that combines

49

quantitative proteomics data with protein interaction analysis to identify significantly

50

dysregulated cellular functions during cancer progression.

51

13

1

Figure 2. Results summary of the statistical analysis to detect significantly

2

dysregulated modules during breast (upper panel) and colorectal (lower panel)

3

cancer progression.

4 5

Figure 3. The network module 780 shows a significantly decreasing trend of

6

expression in both BC and CRC. (A) Expression profile of module 780 during BC

7

progression. Proteins with expression data are depicted as circle, whereas proteins

8

for which expression data is missing are represented as grey diamonds. Positive and

9

negative expression Z-scores are reported as shades of violet and orange

10

respectively. (B) Expression profile of module 780 during CRC progression. Color

11

coding as in (A). Network module representations were generated using Cytoscape

12

[59].

13 14

Appendix A. Supplementary Data

15 16

Supplementary Data A. Supplementary methods for the data pre-processing of the

17

proteomic profiling experiments used in this study.

18 19

Supplementary Data B. Dysregulated modules with the strongest significant trend.

20 21

Supplementary Tables. The human interactome used in this study; the network

22

modules detected by the OCG algorithm; dysregulated modules in breast and

23

colorectal cancer datasets; functional annotations of dysregulated network modules.

24 25

Supplementary Figure 1. Significantly dysregulated network modules with an

26

increasing trend in expression in BC.

27 28

Supplementary Figure 2. Significantly dysregulated network modules with a

29

decreasing trend in expression in BC.

30 31

Supplementary Figure 3. Significantly dysregulated network modules with no

32

significant trend in expression in BC.

33 34

Supplementary Figure 4. Significantly dysregulated network modules with an

35

increasing trend in expression in CRC.

36

14

1

Supplementary Figure 5. Significantly dysregulated network modules with a

2

decreasing trend in expression in CRC.

3 4

Supplementary Figure 6. Significantly dysregulated network modules with no

5

significant trend in expression in CRC.

6

15

1 2 3 4 5 6 7 8 9 10



We propose a network-based method to study cancer progression using proteomics data



We found network modules showing distinct dysregulation trends during progression



Our approach can be applied to any stage-based or time-resolved proteomics dataset

16