New Biotechnology Volume 30, Number 3 March 2013
RESEARCH PAPER
Research Paper
Experimental and computational methods for the analysis and modeling of signaling networks Pier Federico Gherardini and Manuela Helmer-Citterich Department of Biology, University of Rome ‘Tor Vergata’, Via della Ricerca Scientifica 1, 00133 Rome, Italy
External cues are processed and integrated by signal transduction networks that drive appropriate cellular responses. Characterizing these programs, as well as how their deregulation leads to disease, is crucial for our understanding of cell biology. The past ten years have witnessed a gradual increase in the number of molecular parameters that can be simultaneously measured in a sample. Moreover our capacity to handle multiple samples in parallel has expanded, thus allowing a deeper profiling of cellular states under diverse experimental conditions. These technological advances have been complemented by the development of computational methods aimed at mining, analyzing and modeling these data. In this review we give a general overview of the most important experimental and computational techniques used in the field and describe several interesting application of these methodologies. We conclude by highlighting the issues that we think will keep researchers in the field busy in the next few years.
Introduction Interrogating and modeling the behavior of signaling networks is crucial to understand cell biology in health and disease. This is especially evident in cancer research, where it has been conclusively shown that understanding the mechanisms leading to pathogenesis requires abstracting away form mutation in specific genes and focusing the analysis on the mechanisms by which cellular pathways are rewired [1]. Accordingly the concept of ‘network medicine’ has emerged [2], which postulates that cellular networks – more than single genes – should be the targets of therapeutic intervention. Recently this principle has been very effectively demonstrated in breast cancer [3]. To reverse-engineer regulatory networks it is necessary to monitor how genes/proteins are activated under different conditions. This is more easily achieved for genes, thanks to the exceptional development of gene expression analysis techniques such as microarrays and RNA-seq. Accordingly the majority of the efforts in this area have been devoted to the analysis of gene networks. However the earliest cellular response to an external cue usually consists in the activation of upstream signaling Corresponding author: Helmer-Citterich, M. (
[email protected])
networks, which in turn regulate transcription factors. The modulation of gene expression therefore represents a later event. Moreover the expression of a gene is not necessarily correlated with the abundance of the corresponding protein product. The fact that the activity of a protein may depend on post-translational modifications – phosphorylation mainly – further complicates the picture. This reasoning motivates the development of techniques to study the activity of signaling networks at the protein level, which are the focus of the present review (see also [4,5]). From a data analysis perspective these works are based on the premise that it is possible to reveal the structure of a regulatory network by monitoring the correlations between the activities of its constituent proteins. To this end one must gather a dataset that contains variability in activity values. Accordingly perturbation experiments are performed where cells are subjected to multiple cues and the corresponding state of the network is recorded. This conceptual framework poses two requirements on the experimental methodology: 1. It must be possible to reliably measure the amount of 10 or more analytes in a sample. Moreover usually it is necessary to monitor specific phosphorylation sites on the protein.
1871-6784/$ - see front matter ß 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.nbt.2012.11.007
www.elsevier.com/locate/nbt
327
RESEARCH PAPER
Research Paper
2. It must be possible to effectively work on multiple samples in parallel, corresponding to the different perturbation conditions. This area of research lies right at the interface of experiment and computation and advances in one field continuously inform the development of the other. In the following paragraphs we will provide a general overview of the experimental methods and computational techniques used in the field. Automated microscopy – a popular platform for multi-parametric screening – will not be addressed in this review due to space constraints. We refer interested readers to [6,7].
Experimental methods Two methods exist to measure the abundance of a protein in a sample: affinity reagents (i.e. antibodies) and mass-spectrometry. Antibodies-based methods in general require: (i) a system to detect the binding between the antibody and its target; (ii) a multiplexing strategy that allows either multiple antibodies or multiple samples or both, to be combined in the same analysis.
Proteins/lysates/antibodies arrays In these methods, reviewed in [8,9], different samples or reagents are densely arrayed on a physical support. Either the antibodies or the sample lysates can be printed on the array – the latter is usually termed lysate array or Reverse-Phase Protein Array (RPPA). The slide is then incubated with a sample or a specific antibody, depending on what has been printed on the array. Furthermore in diagnostic applications (which will not be discussed further) the array can be printed with protein antigens to determine whether a patient sample contains antibodies directed against them. Advances in these methods are tied to technological improvement in microarray printers [10,11], which must compactly dispense sample and reagents in a consistent way – both in volume and position. Another important characteristic is the capability to handle samples of different viscosities. In the first description of a RPPA 1000 lysates could be printed on a slide of a few centimeters on a side [12], currently 30,000 spots are possible [8]. Lysates are usually printed in serial dilutions to achieve a more precise quantization.
Microwestern Gel separation offers the unique advantage that the complexity of the sample is reduced by separating the proteins via weight or charge. In this way, even if an antibody cross-reacts with other proteins, it is possible to select and quantify the signal from the intended target only. In the microwestern array technique [13] cell lysates are printed on a gel via a microarrayer, with the end result essentially being a combination of 96 independent micro-gels on the same gel. The gel is then run in semidry electrophoresis, transferred on nitrocellulose and incubated with antibodies using a gasket that allows each micro-gel to be physically separated from the others. After incubation with a dye-labeled secondary antibody the signal is quantized with a scanner. The above protocol combines the throughput of RPPAs with the reduction of sample complexity afforded by the electrophoretic separation step.
Luminex In the xMAP technology by Luminex the multiplexing is achieved by mixing together different beads. Each bead is characterized by a 328
www.elsevier.com/locate/nbt
New Biotechnology Volume 30, Number 3 March 2013
specific fluorescence obtained by mixing different fluorophores, and is coated with an antibody of interest. A second antibody for each target is then used, resulting in a sandwich assay. Antibodies in this second set are all dyed with the same fluorophore. In the original implementation of the technology [14] the beads, after incubation with the sample, were run through a flow cytometer that measured both the fluorescence of each bead, identifying the specific analyte, and the signal from the second antibody, thus measuring the amount of protein bound to the bead, and therefore the concentration of the protein in the sample. The MAGPIX technology, introduced in 2010, eschews the flow cytometer by resorting to magnetic beads. A magnet inside the instrument is used to hold the beads in a monolayer, which is imaged using a CCD camera that records and spatially matches the two fluorescent signals.
Flow cytometry Flow cytometry is a technique that permits multi-parametric analyses to be efficiently carried out on a single-cell basis [15]. Cells are first stained with a panel of specific antibodies and then forced to travel in single file by the fluidic system of the cytometer. The antibodies are conjugated with fluorophores and the fluorescence signal is collected as each cell passes through a laser beam. Currently the maximum number of colors that can be used simultaneously is 18. However this is a theoretical limit – in practice panels of more than 10 analytes are extremely complex to design because the spectra of the fluorophores bleed into each other and confuse the measurements, requiring a laborious analysis process termed compensation. Moreover, for the same reason, the antibodies must be matched with the fluorophores according to the expected amount of the corresponding analyte in the sample, with weak fluorophores used for abundant proteins and vice versa. However recently a breakthrough innovation has been described, termed Mass Cytometry or Cytometry by Time-OfFlight (CyTOF), which solves the above-mentioned issues [16,17]. In mass cytometry antibodies are labeled with metal isotopes – typically lanthanides – and are then shot one at the time into an Inductively Coupled Plasma – Mass Spectrometer (ICP – MS), which measures the amount of the metals in each cell. This allows for an extremely precise quantification due to the sensitivity of the spectrometer and also due to the point that signals from the different isotopes do not overlap. Currently 35 analytes can be measured simultaneously. Different samples can be multiplexed together in flow cytometry. This has been achieved by barcoding cells either with fluorophores combinations, in traditional flow cytometry [18], or with metal tags in mass cytometry [19]. These techniques allow entire 96-well plates to be combined together before analysis, thus achieving a consistent staining process across different samples and reducing antibody consumption and acquisition time.
Mass spectrometry Mass spectrometry (MS) is a technique that allows proteins in a sample to be identified and quantified without using affinity reagents. There are a multitude of different techniques and instruments, and a full description of this topic is outside the scope of this review. We will provide only a very basic summary, glossing over the intricacies of each step. Interested readers are referred to [20] for a very good introduction on the subject.
New Biotechnology Volume 30, Number 3 March 2013
Computational methods All the above-mentioned experimental methods are aimed at gathering a matrix of values representing the abundance of specific phosphorylation sites – acting as proxy for the activity of the protein – under different experimental conditions. This information is the starting point for the modeling strategies described in this section (see also [25] for an excellent review). An important distinction to be made is between data-driven methods [26], for example regression-based techniques and Bayesian networks, and approaches that incorporate various degrees of previous knowledge such as Boolean networks and models based on systems of differential equations. Clearly the distinction, while useful, is not completely exclusive as prior knowledge can be used to inform the development of data-driven models.
Partial least squares regression This technique was developed in the field of chemometrics [27] and has been very successfully applied to the study of signaling in several works by the group of Douglas Lauffenburger at MIT (see below, Practical applications). Partial least squares regression consists of a regression analysis that is performed in a reduced space defined by several principal components. These principal components are chosen to maximize the covariance – and thus the predictive power – of a set of independent (predictor) and dependent (response) variables. In most published applications the predictors are the activity values of the proteins in the network and the responses higher-level phenotypes such as apoptosis or cell-cycle progression. Besides being useful for prediction the model can be inspected to reveal relationships between the variables. An often-displayed
feature is the ‘loadings plot’ that shows how strongly the variables project along each principal component. Points that are close in this space represent correlated variables.
Bayesian networks Bayesian networks are powerful instruments to model and represent the correlation structure of a group of variables (see [28] for an excellent primer). Central to this method is the notion of ‘conditional independence’. Imagine three proteins that are connected in a linear activation cascade A ! B ! C. If we look at the raw correlation values they will all appear to be correlated with each other in activity. However the correlation between A and C is mediated by B – once we know B the value of A does not give any further information on the state of C. In Bayesian network terminology A and C are conditionally independent given B, and this would be represented as a graph connecting A, B and C in a linear cascade, a direct parallelism with the biological intuition. Therefore the aim of a method implementing this approach is to reconstruct the graph representing the signaling network starting only from the analyte values, by looking at their correlations. One caveat is that it is not always possible to determine the directionality of a relationship, that is, whether A activates B or vice versa, but this issue can be resolved with specific inhibition experiments.
Boolean networks Graph representation of signaling networks can be conveniently modeled as Boolean networks, where nodes correspond to proteins and directed edges encode regulatory interactions. In a pure Boolean formalism a node can assume only two values, 0 (inactive) and 1 (active), according to the state of its regulators. This dependency is encoded in truth tables that specify the state of a node corresponding to every possible combination of the values of its regulators – the AND logic operator would be an example of a row in this table: the node value is 1 when all its regulators are 1 and 0 otherwise. While the two-state simplification is useful, it also requires a normalization of the experimental data that can be rather tricky. An extension of the Boolean formalism is fuzzy logic, where entities can have multiple states and it is possible to specify a different degree of membership to each state, expressed as a continuous value in the 0-1 range [29]. Models of this kind have been constructed from a manual analysis of the literature and hand-tuned to experimental data [30]. However methods such as CellNetworkOptimizer [31] allow the structure of the model to be optimized against data from perturbation experiments. This software has also been extended to handle time-series data and continuous models based on differential equations [32]. Morris et al. [33] have also described a strategy for the optimization of fuzzy-logic models. The method does not implement the full fuzzy formalism described above, but allows nodes to have continuous values and the relationship between the activity of a node and its regulators to be modeled with continuous non-linear functions. Probabilistic extension of the Boolean network approach has also been described [34,35]. The MetaReg software [35] assumes that a node can be regulated by multiple deterministic functions, a single one of which is selected according to a specific probability that is estimated from the data. In practice this means that – contrary to other methods – it is not necessary to specify whether www.elsevier.com/locate/nbt
329
Research Paper
A mass spectrometer measures the mass-to-charge ratio (m/z) of the analytes, which, in a proteomics experiment, are peptides generated from the enzymatic digestion of a sample. Each peptide produces a specific m/z peak – a cluster of peaks actually but we will not delve into the details. Single peptides (precursor ions in MS lingo) are then selected by the instrument and fragmented. By comparing the masses of the resulting fragments (product ions) it is possible to infer the sequence of the peptide, which is then matched to a database to identify the corresponding protein. To increase the sensitivity of the analysis it is possible to direct the instrument to monitor a specific pair of precursor-product ions – termed transitions – that have pre-determined m/z values. This is termed Single Reaction Monitoring (SRM) [21]. In practice it is currently possible to monitor several transitions in the same run (Multiple Reactions Monitoring, MRM) thus resulting in the identification of hundreds of proteins – and specific phosphorylation sites – which makes MRM the MS technique most suited to the analysis of signaling networks. A precise quantification in an MS experiment can be obtained by relating the signals originating from two copies of the same analyte in the sample, one of which acts as internal standard [22]. To have two signals that can be distinguished in a mass spectrometer one can: (i) grow the cells in a medium that contains aminoacids labeled with a specific isotope (SILAC [23]); (ii) derivatize the samples post-lysis with specific chemical tags (e.g. iTRAQ); (iii) spike-in (i.e. add) a heavy isotope version of the protein/peptide, possibly in known quantity (e.g. Absolute Quantification (AQUA) peptides [24]).
RESEARCH PAPER
RESEARCH PAPER
a relationship is activating/inhibitory, as the software will infer this from the data. Moreover the discretization functions are also automatically optimized.
Differential equations models
Research Paper
These models aim at a direct representation of the physicochemical processes that underlie signal transduction. They are based on the observation that, in a chemical reaction, the rate at which the concentration of the products changes depends on the concentration of the reactants and on one or more rate constants. For instance, in an extremely simple formulation, the rate of change of a phosphosite may depend on the concentrations of the kinase and substrate times a rate constant. Multiple equations like this can be written to describe a signaling pathway of interest at an arbitrary level of detail. The model is therefore comprised a system of equations that can be solved with numerical methods using a computer. The main issue – and the bottleneck for these kinds of models – is to estimate from experimental data the initial concentrations of the species and the values of the rate constants. There is an immense body of literature on these issues and a detailed treatment is outside the scope of this review (see [36] for an introduction to the subject). We just want to highlight a very interesting study [37] that showed that, while some parameter combinations affect the predictive performance of a model, others can have very large variations without having a significant impact on the model – a property termed ‘sloppiness’. This implies that, while a model could correctly predict the collective behavior of a system, one should not assume that the estimated parameter values reflect the underlying physical reality. The authors also show that even exceptionally accurate – that is, not achievable with current technologies – parameter measurement would lead to large prediction uncertainties if even a single parameter to which the model is sensitive was poorly constrained. The authors therefore argue for using collective fit strategies where one tries to optimize the predictions that are derived from the model against experimental data, instead of pursuing the reductionist approach of precisely estimating parameters values.
Practical applications In this paragraph we will detail some of the most interesting applications that have been published in the field. Because of space constraints, we will especially focus on those involving an extensive modeling component, skipping for instance several works where mass-spectrometry was used to characterize signaling networks but the data were not used for systematic computational modeling [38–41]. Data-driven modeling based on partial least squares regression (PLSR) has been very successfully applied in several works to link the activity of molecular signals to higher-level phenotypes of interest such as cell cycle progression and apoptosis [3,42–48]. Even though this approach does not provide a detailed network model of signaling, it is very effective at capturing and describing the phenotypic consequences of molecular events. These insights can be invaluable for hypothesis generation and to guide subsequent experimental work. In an extremely interesting example PLSR modeling was used to show that time-staggered EGFR inhibition unmask an apoptotic pathway involving Caspase-8 activation 330
www.elsevier.com/locate/nbt
New Biotechnology Volume 30, Number 3 March 2013
that sensitizes triple-negative breast cancer cells to subsequent treatment with genotoxic drugs [3]. In another work the inspection of the PLSR model revealed an unexpected, context-dependent, role for ERK in G1/S arrest and apoptosis [42]. Similarly higher-order – not necessarily direct – interactions between molecular species were extracted from a differential equations model of the ErbB signaling network by exploring parameters sensitivities [49]. Bayesian networks represent another popular data-driven modeling strategy [13,50,51], whose usefulness was demonstrated in a landmark study [51] where flow cytometry was used to monitor the activation of 11 phosphoproteins in CD4+ T lymphocytes. Thanks to this experimental technology it was possible to gather a single-cell dataset comprising thousands of correlations, which enormously increased the statistical power of the analysis. Melas et al. provided an interesting example of combining logic and data-driven modeling [52]. Their aim was to model how the activation of seven transmembrane receptors regulates the secretion of a panel of 22 cytokines. They first assembled from the literature a logic model describing downstream signaling events following the activation of the receptors. This model included 16 phosphoproteins whose activity in different conditions was measured using Luminex technology. In parallel the authors also measured cytokine release in the same conditions. However a detailed pathway describing how the activation of the phosphoproteins leads to cytokine release was not available. Therefore the authors used a data-driven regression approach to connect each cytokine with the phosphoproteins that affect its release, according to the regression model. The structure of the combined pathway was then optimized against experimental data using Integer Linear Programming. Sacco et al. used logic modeling to obtain a higher-detail mapping of gene products onto complex pathways on a large scale [53]. They assembled and optimized a logic model describing growth pathways in HeLa cells. The model was then used to map on specific nodes 41 phosphatases whose inhibition was found to modulate the same pathways in a high-content siRNA screening. The mapping was based on in silico simulations with the model under the assumption that, if the effect of inhibiting a phosphatase is the same as up-regulating a node of the model, then the phosphatase must be an inhibitor of that node.
Conclusions and future perspectives The modeling of signal transduction network is an exciting and active field of research that promises to foster our understanding of cellular programs and their deregulation, particularly in cancer [54]. Probing the signaling networks has already been shown to be useful for the development of new therapeutic regimens [3] and for prognosis and patient stratification [55,56]. However we have only scratched the surface of the possible translational implications of systems biology [57]. From an experimental point of view we believe that the most promising developments are in quantitative directed proteomics and in single-cell technologies, such as mass cytometry. The power of the former has yet to be fully harnessed in a comprehensive modeling application. By contrast single-cell technologies allow a depth of profiling that is crucial to increase our comprehension of cancer [58]. Indeed tumors are heterogeneous tissues composed of different malignant cells populations as well as stromal and
New Biotechnology Volume 30, Number 3 March 2013
Whatever developments lie ahead, the long-term benefit of these studies will be a new generation of therapies where the pharmacological perturbation of genes and proteins is addressed in a network context, thus allowing the development of effective drug combinations as well as uncovering opportunities for drug repositioning.
Acknowledgements We thank Prof. Gianni Cesareni and Dr. Francesca Sacco for useful discussions. This work was supported by grant AIRC IG10298.
References [1] Jones S, Zhang X, Parsons DW, Lin JC-H, Leary RJ, Angenendt P, et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 2008;321:1801–6. [2] Pawson T, Linding R. Network medicine. FEBS Letters 2008;582:1266–70. [3] Lee MJ, Ye AS, Gardino AK, Heijink AM, Sorger PK, MacBeath G, et al. Sequential application of anticancer drugs enhances cell death by rewiring apoptotic signaling networks. Cell 2012;149:780–94. [4] Terfve C, Saez-Rodriguez J. Modeling signaling networks using high-throughput phospho-proteomics. Advances in Experimental Medicine and Biology 2012;736:19–57. [5] Alexopoulos LG, Saez Rodriguez J, Espelin CW. High-throughput protein-based technologies and computational models for drug development, efficacy, and toxicity. In: Sean Ekins, Jinghai J. Xu, editors. Drug Efficacy, Safety, and Biologics Discovery: Emerging Technologies and Tools. John Wiley & Sons, Inc.; 2009. p. 29–52. [6] Pepperkok R, Ellenberg J. High-throughput fluorescence microscopy for systems biology. Nature Reviews Molecular Cell Biology 2006;7:690–6. [7] Zanella F, Lorens JB, Link W. High content screening: seeing is believing. Trends in Biotechnology 2010;28:237–45. [8] Spurrier B, Honkanen P, Holway A, Kumamoto K, Terashima M, Takenoshita S, et al. Protein and lysate array technologies in cancer research. Biotechnology Advances 2008;26:361–9. [9] Kingsmore SF. Multiplexed protein measurement: technologies and applications of protein and antibody arrays. Nature Reviews Drug Discovery 2006;5:310–20. [10] Seidel M, Niessner R. Automated analytical microarrays: a critical review. Analytical and Bioanalytical Chemistry 2008;391:1521–44. [11] Austin J, Holway AH. Contact printing of protein microarrays. Methods in Molecular Biology 2011;785:379–94. [12] Paweletz CP, Charboneau L, Bichsel VE, Simone NL, Chen T, Gillespie JW, et al. Reverse phase protein microarrays which capture disease progression show activation of pro-survival pathways at the cancer invasion front. Oncogene 2001;20:1981–9. [13] Ciaccio MF, Wagner JP, Chuu C-P, Lauffenburger DA, Jones RB. Systems analysis of EGF receptor signaling dynamics with microwestern arrays. Nature Methods 2010;7:148–55. [14] Fulton RJ, McDade RL, Smith PL, Kienker LJ, Kettman JR. Advanced multiplexed analysis with the FlowMetrix system. Clinical Chemistry 1997;43:1749–56. [15] Bendall SC, Nolan GP, Roederer M, Chattopadhyay PK. A deep profiler’s guide to cytometry. Trends in Immunology 2012;33:323–32. [16] Bandura DR, Baranov VI, Ornatsky OI, Antonov A, Kinach R, Lou X, et al. Technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. Analytical Chemistry 2009;81:6813–22. [17] Bendall SC, Simonds EF, Qiu P, Amir E-AD, Krutzik PO, Finck R, et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 2011;332:687–96. [18] Krutzik PO, Hale MB, Nolan GP. Characterization of the murine immunological signaling network with phosphospecific flow cytometry. Journal of Immunology 2005;175:2366–73. [19] Bodenmiller B, Zunder ER, Finck R, Chen TJ, Savig ES, Bruggner RV, et al. Multiplexed mass cytometry profiling of cellular states perturbed by smallmolecule regulators. Nature Biotechnology 2012;30(9):858–67. [20] Steen H, Mann M. The ABC’s (and XYZ’s) of peptide sequencing. Nature Reviews Molecular Cell Biology 2004;5:699–711. [21] Picotti P, Aebersold R. Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions. Nature Methods 2012;9:555–66. [22] Ong S-E, Mann M. Mass spectrometry-based proteomics turns quantitative. Nature Chemical Biology 2005;1:252–62. [23] Ong S-E, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, et al. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Molecular & Cellular Proteomics 2002;1:376–86. [24] Gerber SA, Rush J, Stemman O, Kirschner MW, Gygi SP. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proceedings of the National Academy of Sciences of the United States of America 2003;100:6940–5.
[25] Kholodenko B, Yaffe MB, Kolch W. Computational approaches for analyzing information flow in biological networks. Science Signaling 2012;5:re1. [26] Janes KA, Yaffe MB. Data-driven modelling of signal-transduction networks. Nature Reviews Molecular Cell Biology 2006;7:820–8. [27] Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Analytica Chimica Acta 1986;185:1–17. [28] Pe’er D. Bayesian network analysis of signaling networks: a primer. In: Science’s STKE 2005. 2005. p. l4. [29] Aldridge BB, Saez-Rodriguez J, Muhlich JL, Sorger PK, Lauffenburger DA. Fuzzy logic analysis of kinase pathway crosstalk in TNF/EGF/insulin-induced signaling. PLoS Computational Biology 2009;5:e1000340. [30] Morris MK, Saez-Rodriguez J, Sorger PK, Lauffenburger DA. Logic-based models for the analysis of cell signaling networks. Biochemistry 2010;49:3216–24. [31] Saez-Rodriguez J, Alexopoulos LG, Epperlein J, Samaga R, Lauffenburger DA, Klamt S, et al. Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Molecular Systems Biology 2009;5:331. [32] MacNamara A, Terfve C, Henriques D, Bernabe´ BP, Saez-Rodriguez J. State-time spectrum of signal transduction logic models. Physical Biology 2012;9:045003. [33] Morris MK, Saez-Rodriguez J, Clarke DC, Sorger PK, Lauffenburger DA. Training signaling pathway maps to biochemical data with constrained fuzzy logic: quantitative analysis of liver cell responses to inflammatory stimuli. PLoS Computational Biology 2011;7:e1001099. [34] Shmulevich I, Dougherty ER, Kim S, Zhang W. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 2002;18:261–74. [35] Gat-Viks I, Tanay A, Raijman D, Shamir R. A probabilistic methodology for integrating knowledge and experiments on biological networks. Journal of Computational Biology 2006;13:165–81. [36] Aldridge BB, Burke JM, Lauffenburger DA, Sorger PK. Physicochemical modelling of cell signalling pathways. Nature Cell Biology 2006;8:1195–203. [37] Gutenkunst RN, Waterfall JJ, Casey FP, Brown KS, Myers CR, Sethna JP. Universally sloppy parameter sensitivities in systems biology models. PLoS Computational Biology 2007;3:1871–8. [38] Huang PH, Mukasa A, Bonavia R, Flynn RA, Brewer ZE, Cavenee WK, et al. Quantitative analysis of EGFRvIII cellular signaling networks reveals a combinatorial therapeutic strategy for glioblastoma. Proceedings of the National Academy of Sciences of the United States of America 2007;104:12867–72. [39] Zhang Y, Wolf-Yadlin A, Ross PL, Pappin DJ, Rush J, Lauffenburger DA, et al. Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules. Molecular & Cellular Proteomics 2005;4:1240–50. [40] Blagoev B, Ong S-E, Kratchmarova I, Mann M. Temporal analysis of phosphotyrosine-dependent signaling networks by quantitative proteomics. Nature Biotechnology 2004;22:1139–45. [41] Wolf-Yadlin A, Hautaniemi S, Lauffenburger DA, White FM. Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proceedings of the National Academy of Sciences of the United States of America 2007;104:5860–5. [42] Tentner AR, Lee MJ, Ostheimer GJ, Samson LD, Lauffenburger DA, Yaffe MB. Combined experimental and computational analysis of DNA damage signaling reveals context-dependent roles for Erk in apoptosis and G1/S arrest after genotoxic stress. Molecular Systems Biology 2012;8:568. [43] Lau KS, Juchheim AM, Cavaliere KR, Philips SR, Lauffenburger DA, Haigis KM. In vivo systems analysis identifies spatial and temporal aspects of the modulation of TNF-a-induced apoptosis and proliferation by MAPKs. Science Signaling 2011;4:ra16. [44] Kumar N, Wolf-Yadlin A, White FM, Lauffenburger DA. Modeling HER2 effects on cell behavior from mass spectrometry phosphotyrosine data. PLoS Computational Biology 2007;3:e4. [45] Janes KA, Gaudet S, Albeck JG, Nielsen UB, Lauffenburger DA, Sorger PK. The response of human epithelial cells to TNF involves an inducible autocrine cascade. Cell 2006;124:1225–39. [46] Wolf-Yadlin A, Kumar N, Zhang Y, Hautaniemi S, Zaman M, Kim H-D, et al. Effects of HER2 overexpression on cell signaling networks governing proliferation and migration. Molecular Systems Biology 2006;2:54.
www.elsevier.com/locate/nbt
331
Research Paper
infiltrating immune cells. Probing the interaction between these different players is fundamental to bring our understanding of disease progression to the next level, thus paving the way for the development of effective therapeutic strategies. From a computational perspective a crucial issue is how to integrate prior-knowledge in data-driven approaches and vice versa how literature-derived model can be systematically augmented by extracting the most significant signals from the data. While several works pursuing these strategies have been published [52,59], much remains to be done.
RESEARCH PAPER
RESEARCH PAPER
Research Paper
[47] Janes KA, Albeck JG, Gaudet S, Sorger PK, Lauffenburger DA, Yaffe MB. A systems model of signaling identifies a molecular basis set for cytokine-induced apoptosis. Science 2005;310:1646–53. [48] Janes KA, Kelly JR, Gaudet S, Albeck JG, Sorger PK, Lauffenburger DA. Cuesignal-response analysis of TNF-induced apoptosis by partial least squares regression of dynamic multivariate data. Journal of Computational Biology 2004;11:544–61. [49] Chen WW, Schoeberl B, Jasper PJ, Niepel M, Nielsen UB, Lauffenburger DA, et al. Input–output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Molecular Systems Biology 2009;5:239. [50] Woolf PJ, Prudhomme W, Daheron L, Daley GQ, Lauffenburger DA. Bayesian analysis of signaling networks governing embryonic stem cell fate decisions. Bioinformatics 2005;21:741–53. [51] Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP. Causal protein-signaling networks derived from multiparameter single-cell data. Science 2005;308: 523–9. [52] Melas IN, Mitsos A, Messinis DE, Weiss TS, Alexopoulos LG. Combined logical and data-driven models for linking signalling pathways to cellular response. BMC Systems Biology 2011;5:107.
332
www.elsevier.com/locate/nbt
New Biotechnology Volume 30, Number 3 March 2013
[53] Sacco F, Gherardini PF, Paoluzi S, Saez-Rodriguez J, Helmer-Citterich M, Ragnini-Wilson A, et al. Mapping the human phosphatome on growth pathways. Molecular Systems Biology 2012;8:1–15. [54] Pe’er D, Hacohen N. Principles and strategies for developing network models in cancer. Cell 2011;144:864–73. [55] Irish JM, Myklebust JH, Alizadeh AA, Houot R, Sharman JP, Czerwinski DK, et al. B-cell signaling networks reveal a negative prognostic human lymphoma cell subset that emerges during tumor progression. Proceedings of the National Academy of Sciences of the United States of America 2010;107:12747–54. [56] Irish JM, Hovland R, Krutzik PO, Perez OD, Bruserud Ø, Gjertsen BT, et al. Single cell profiling of potentiated phospho-protein networks in cancer cells. Cell 2004;118:217–28. [57] Erler JT, Linding R. Network medicine strikes a blow against breast cancer. Cell 2012;149:731–3. [58] Bendall SC, Nolan GP. From single cells to deep phenotypes in cancer. Nature Biotechnology 2012;30:639–47. [59] Eduati F, Las Rivas De J, Di Camillo B, Toffolo G, Saez-Rodriguez J. Integrating literature-constrained and data-driven inference of signalling networks. Bioinformatics 2012;28:2311–7.