Modelling heterogeneous intracellular networks at the cellular scale

Modelling heterogeneous intracellular networks at the cellular scale

Journal Pre-proof Modelling heterogeneous intracellular networks at the cellular scale Ann C. Babtie PII: S2452-3100(19)30039-3 DOI: https://doi.or...

825KB Sizes 0 Downloads 44 Views

Journal Pre-proof Modelling heterogeneous intracellular networks at the cellular scale Ann C. Babtie PII:

S2452-3100(19)30039-3

DOI:

https://doi.org/10.1016/j.coisb.2019.10.014

Reference:

COISB 265

To appear in:

Current Opinion in Systems Biology

Received Date: 28 April 2019 Revised Date:

30 October 2019

Accepted Date: 30 October 2019

Please cite this article as: Babtie AC, Modelling heterogeneous intracellular networks at the cellular scale, Current Opinion in Systems Biology, https://doi.org/10.1016/j.coisb.2019.10.014. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.

Modelling heterogeneous intracellular networks at the cellular scale Ann C. Babtie1,* 1

Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK * [email protected] Abstract Cell function relies on the coordinated action of heterogeneous interconnected networks of biomolecules. Mathematical models help us explore the dynamics and behaviour of these intracellular networks in greater detail. Models of increasing scale and complexity are being developed to probe cellular processes, often necessitating the use of several types of mathematical representation in hybrid models. Here we review recent efforts to incorporate the influences of stochasticity and spatial heterogeneity into cellular level models, ranging from abstract coarsegrained representations to large-scale hybrid models comprising thousands of biological components. We discuss the key challenges involved in, and recent mathematical advances enabling the development and analysis of mathematical models of complex intracellular processes. Highlights • Stochasticity and spatial heterogeneity play critical roles in cell function • Mathematical models allow us to study complex intracellular dynamics • Recent advances enable spatial and temporal heterogeneity to be modelled • Models with broad ranges of scope and scale provide insight into cell function Keywords heterogeneity, hybrid model, inference, simulation, stochastic, whole-cell model Introduction Mathematical modelling has provided numerous insights into the behaviour and dynamics of biological systems. The wealth of data generated using modern high-throughput experimental techniques --- and increasing computational power --- have led to models of increasing scope and scale in recent years. The complexity of interconnected intracellular processes is driving interest in creating models that are hybrid and/or multiscale in nature to help us gain insight into cellular function. Here, we focus on dynamical models that capture the influence of temporal or spatial heterogeneity on intracellular networks, and allow simulation of network behaviour over time. We consider both the challenges of scaling modelling approaches to create large, detailed representations of hundreds or thousands of biomolecules, and much more abstract coarse-grained cell models that enable efficient analysis of selected features. Biological systems are frequently modelled using ordinary differential equations (ODEs), with an established array of mathematical methods available to help us construct, select, parameterise, simulate, and validate suitable models. However, a deterministic approach neglects the influence of stochasticity, which can strongly influence system dynamics and behaviour, and plays a critical role in many biological processes [1-3]. A more comprehensive understanding of cell behaviour requires us to consider the effects of temporal and spatial heterogeneity, requiring alternative modelling formalisms, and --- as we aim to increase the scope and scale of models --- often `hybrid' models comprising several distinct types of model coupled together. Even when attempting to model biological systems in great detail, any mathematical model remains a simplification of the true processes. Particularly when modelling biological phenomena, we cannot

base model structures on fundamental physical laws but necessarily require more abstract representations. To be confident that a model provides useful insight into the cellular processes we aim to mimic, we need to consider how to select an appropriate model structure and parameter values based on prior knowledge and experimental data, and how to quantify the inevitable uncertainties associated with it. Development of models of increasing scale, complexity and heterogeneity needs to be accompanied by improved methods to assess model quality and validity; to identify areas for refinement and improvement; and to explore their limitations so that we can trust the predictions and insight we gain. Modelling heterogeneous intracellular networks Cell development and function relies on the precisely controlled, coordinated action of networks of interacting molecular species and key intracellular processes, including metabolism, gene expression, signalling and replication. A variety of mathematical formalisms have been applied to model aspects of cell biology [4-6]; the type of model selected to represent a particular process depends on factors including data availability, prior knowledge, quantities of molecular species, the modelling aim, and analytical tractability. Of course, biological processes do not function in isolation, and to model heterogeneous cellular networks we either have to accommodate multiple cellular processes within the same mathematical representation or create hybrid models. Metabolic models are probably the best established cellular level models, with genome-scale metabolic reconstructions and constraint-based modelling methods (e.g. flux balance analysis) providing stoichiometric descriptions of biochemical reactions and fluxes [7,8]. Such methods have been extended and adapted to incorporate kinetic information and other aspects of cell biology, e.g. gene expression, into metabolic models [8-13]. The assumptions central to constraint-based methods (e.g. assumptions of steady state and optimality of selected biological goals such as growth) are less applicable to other cellular processes, including signalling and transcriptional regulation, so detailed dynamic models of these networks instead rely on alternative formalisms such as ODE or Boolean models [14-18]. Modelling at a whole cell level Efforts to represent heterogeneous processes at a cellular scale have led to diverse approaches and models (Figure 1). Several of the fine-grained detailed models of selected networks and processes mentioned above aim to scale to the cellular level, but most are deterministic in nature or lack the kinetic information to generate dynamic simulations. Exceptions include stochastic simulations of specific signalling pathways [14], or models containing some stochastic elements such as the representation of gene expression in human cancer related pathways [17]. At a much smaller scale, `coarse-grained' models of cell physiology represent key intracellular processes and their interconnections but, by design, in a very abstract and simplified way. Despite their apparent simplicity, such models can e.g. capture observed microbial growth laws and dynamics [19-22]. While many of these models are again deterministic, stochastic models of Escherichia coli cells have also been developed, linking the demands of processes such as transcription, metabolism and replication to bacterial growth and enabling variability in cell populations to be explored [21,22]. Such models represent generic categories of molecular components rather than attempting to include any detail about specific genes and proteins, thus limiting models to a scale that is computationally tractable and amenable to well-established analytical tools. At the opposite end of the scale, and constituting perhaps the most ambitious attempt to date to create a detailed comprehensive model of cell function, is the whole-cell model (WCM) of a Mycoplasma genitalium bacterium that represents all annotated gene functions [23]. This hybrid

stochastic-deterministic model comprises 28 submodels of different cellular processes and thousands of parameters. The model approximates experimentally observed data such as cellular chemical composition and metabolite concentrations; and it has been used to analyse various aspects of cell behaviour including protein-DNA binding dynamics, cell cycle regulation, and kinetic rate constants [23,24]. Several groups are leading efforts to extend such approaches to far more complex cells, e.g. human embryonic stem cells [25]; however, the increased scale and complexity required will necessitate the development of improved and novel tools for steps including data collection, integration and model construction, through the coordinated efforts of many research teams [25,26]. Such complex models will also require new approaches to deal with tasks such as model simulation, parameterisation and validation, as existing tools for model analysis are generally not scalable to this level [27]. While some models mentioned above use stochastic modelling to account for the temporal heterogeneity of biological processes (especially important for processes such as transcription and signalling that involve small numbers of interacting molecules) they largely overlook the influence of spatial organisation and heterogeneity within cells. The intracellular environment is known to be highly crowded, with densely packed biomolecules, and varying degrees of compartmentalisation and internal structural constraints. Widely used modelling approaches such as ODEs and stochastic differential equations (SDEs) assume homogenous, well-mixed conditions, where movement of molecules is not restricted [28,29]. Modern microscopy and imaging technologies provide great detail about internal cell structure, but incorporating structural features into model simulations requires us to use modelling formalisms that account for spatial heterogeneity. For intracellular modelling this is predominantly done using particle-based simulation algorithms or the reactiondiffusion master equation (RDME) approximation [28]. The latter has been successfully used for detailed simulations of selected biological processes within the context of a structured and crowded cellular environment (including protein localisation and ribosome biogenesis in bacterial cells and RNA splicing in human cells) [30-32]. These results showed that considering spatial heterogeneity was crucial to correctly mimic experimental observations, and provided insight into the potentially utility of certain aspects of the observed spatial organisation. Challenges of modelling spatial and temporal heterogeneity Considerable challenges are posed by incorporating temporal and spatial heterogeneity into models, yet efforts to do so highlight that such approaches are essential to understand the outcome of many biological processes. Stochastic simulation presents much greater computational demands than deterministic ODE modelling, as multiple probabilistic simulations are required to understand variability in behaviour; as the scale and complexity of models increase these demands escalate. For smaller models, exact sample path simulations are possible using stochastic simulation algorithms (SSA), but these quickly become impractical at larger scales [29,33]. Various approaches have been developed that extend the scope of stochastic modelling to larger systems through approximations (e.g. diffusion approximation, system size expansion, or moment closure methods); and hybrid simulation algorithms aim to adaptively model different parts of the system in the manner best suited to deal with the relevant particle numbers (e.g. using SSA and an approximation method for low and higher copy number species respectively) [29,33]. Additionally including spatial information in stochastic simulations introduces yet more complexity and again demands alternative approximation methods for modelling at a cellular level [28]. Despite advances in approximation methods enabling stochastic simulation of more complex systems, purely stochastic detailed models of multiple heterogeneous intracellular pathways remain out of reach, motivating the use of large-scale hybrid models. Such models enable us to integrate heterogeneous datasets, probe our understanding of complex biological dynamics, and highlight

areas for further experiments. However, significant challenges remain around how to best construct, analyse and use such complex models. Linking models to experimental data Experimental technologies to generate genomic, molecular and imaging data continue to rapidly advance, leading to a huge wealth of information about cellular processes. Data is collected at ever increasing throughput and resolution, with the capability to gather genetic, epigenetic, spatial and proteomic information at the single-cell level, and, in the case of multimodal techniques, to profile multiple aspects of a cell simultaneously [34]; additionally, there are coordinated efforts to understand cell-type and individual specific variation [35-37]. Effective ways to collate, integrate, and share these data as they amass are clearly essential. Many databases provide detailed accumulated knowledge about intracellular networks, often tailored towards specific cell types or diseases, e.g. [35,38-40]. These data inform our current understanding of the molecular details of intracellular pathways, and thus our model structures. However, most dynamical models also require quantitative parameter values representing e.g. reaction rates or binding constants. One approach is to rely on suitable experimental measurements or estimate values from relevant literature sources; the original M.genitalium WCM for example used data from over 900 publications [23]. However, such an approach has important limitations --- experimentally derived values may not accurately reflect in vivo parameters [41], some parameters of interest will not be experimentally accessible, and the uncertainties of such estimates are often unknown. Statistical inference methods instead aim to optimise parameter values by minimising the discrepancy between some aspect of the model output and observed experimental data. Depending on the method we may be able to incorporate prior knowledge, and obtain some measure of the uncertainty of inferred parameter values; Bayesian methods enable inference of the full joint parameter probability distribution. While inference helps overcome some limitations of experimental parameterisation, it is challenging to adapt these techniques to the scale required for detailed cellular models, particularly when considering temporal and spatial heterogeneity or hybrid models [27,33]. Method development is ongoing though, with improved simulation algorithms and likelihood-free approaches benefitting inference for stochastic models [33] and inference becoming feasible even for ODE models with thousands of species and parameters [18,42]. However, these and other examples of large-scale biological models highlight that many parameter values cannot be constrained by the available experimental data (and in some cases remain unidentifiable regardless of data limitations) [18,43,44]. The challenges of parameterising models are compounded when considering hybrid models. Challenges of large-scale and hybrid modelling Hybrid models of biological systems comprise multiple different models that are coupled in a manner allowing simulation of the complete system; this requires us to define shared variables that link individual submodels enabling transfer of information between them. To realistically reflect tightly integrated cellular processes, a simulation scheme should allow regular communication between modules [6]. For the M.genitalium WCM this was achieved by assuming that on a short time-scale each module could be considered independently, with shared model variables updated after each short time advance [23]. Similarly, the RDME approximation for incorporating spatial heterogeneity into stochastic models assumes that, for selected length and time scales, small subvolumes can be treated as individual homogenous units and simulated independently [28]. Partitioning a system based on suitable time, length and/or concentration scales can enable efficient simulation through approximations and coupling of different modelling formalisms (e.g. deterministic and stochastic elements) in hybrid models. However, deciding how to best partition a

system is a far from trivial problem, often relying on ad hoc and heuristic methods, and for some systems may require more sophisticated adaptive partitioning approaches (e.g. where species numbers vary significantly during the course of simulations) [6,28,29]. Such approaches successfully allow us to simulate dynamics of these complex systems, but linking models this way renders many model analysis tools inapplicable. Despite the success at mimicking the observed behaviour of biological systems, we lack ways to rigorously analyse the models and, e.g. quantify uncertainties, perform sensitivity analyses, infer parameters for a coupled model, and identify areas needing refinement. It has repeatedly been demonstrated that, even with small simple models, often many model structures and parameters can reproduce a given behaviour. To trust the predictions and conclusions drawn from a model --- and understand their inevitable limitations and biases --- it is vital to be able to rigorously analyse and validate them [6,27,45]. Future developments for large-scale hybrid models The challenges facing development and analysis of large hybrid models are motivating research in many areas. Although thorough analysis of the complete complex hybrid models will likely remain elusive for some time, continual mathematical advances will aid development of the constituent submodels. As discussed above, tools for creating and analysing stochastic and deterministic models are now applicable at increasing scales. Alternative methods for representing complex model structures are being explored [46-48]; perhaps most relevant to detailed hybrid models are `rulebased' methods which concisely represent models with large state spaces by defining patterns and reaction rules for state transitions (rather than explicitly defining all possible states). These have been used to represent large signalling pathways where multiple protein modifications and interactions lead to high combinatorial complexity [47,48]. Network-free stochastic simulation algorithms enable efficient simulation of these models by probabilistically sampling reactions and updating states as they are used rather than enumerating all potential possibilities, many of which will not be encountered in a given scenario [14,48]. Such approaches are central to proposed efforts to create detailed WCM of more complex cells [25,26]. Despite continuing progress in simulating complex hybrid models, alternative approaches that involve simplifying aspects of such models are likely also needed, particularly to make simulation efficient enough for tasks such as model calibration and uncertainty quantification. Although we increasingly have tools to achieve these tasks for individual submodels, when such models are combined there will inevitably be additional complexities and uncertainties introduced, so it is important to also assess the validity of the combined model [6,27]. Model reduction methods aim to reduce model size while preserving its behaviour [6], automated ways to identify where model complexity can be reduced need to be developed. Alternatively, `emulators’ (or surrogate models) focus purely on mimicking the input-output behaviour of a model while ignoring the internal model structure, providing simplified approximations for complex multivariate models [6,49]; such approaches have been widely used in engineering disciplines. Emulation can combine the strengths of machine learning and mechanistic modelling, by using simulated input-output data from the original model to train a statistical model able to approximate its behaviour [49]. Both model reduction and emulation can decrease the computational cost of simulations, opening up further possibilities for calibration and analysis of hybrid models (e.g. selected submodels could be optimised and refined while using simplified versions of other submodels). Conclusion Recent years have seen continual advancements in approaches to model complex biological systems and account for the influences of temporal and spatial heterogeneity. Large-scale hybrid modelling receives a lot of attention and drawing together detailed molecular-level knowledge of all aspects of a cell's function and heterogeneous datasets into a cohesive modelling framework is a tantalising

prospect. Such efforts will drive exciting developments for all stages of a modelling study --- from generating and curating experimental data, to constructing, parameterising and analysing such models [25,26]. It is worth noting though that for the foreseeable future such complex models are likely to remain relatively intractable to detailed model analysis tasks, e.g. optimisation, model selection, uncertainty quantification and refinement. Models of reduced complexity, abstract coarse-grained models, and detailed models of selected aspects of cell function, are much more amenable to such approaches and thus have a critical role to play in helping us gain insight into cellular processes and the influences of heterogeneity. Of course, knowledge learned from such models will help inform the construction of the components of more complex models. Simulating the complex dynamics of heterogeneous interconnected networks in detail requires similarly complex models, but there is a trade-off between model complexity and our ability to thoroughly analyse a model. By neglecting some level of detail, or some intracellular processes, coarser-grained or focused fine-grained models can be targeted to specific questions and analysed in greater depth. This better enables us to probe their validity, understand their limitations, and appreciate any biases and uncertainties in modelling predictions and conclusions. Such models are also currently better suited to creating cell-type or individual specific models as they can be accessible to existing tools for e.g. model parameterisation and refinement. Clearly models at many levels of scope and scale have complementary roles to play in advancing our understanding of cell function, and will continue to help us exploit and interpret the vast quantities of richly detailed data now available. Acknowledgements ACB gratefully acknowledges support through a BBSRC Future Leaders Fellowship (Grant reference BB/N011597/1). Figure 1 Comparison of the scale and modelling formalisms used by selected dynamic models of intracellular processes that are discussed in this review. Colours indicate the general type of mathematical representation(s) used in a given model. The x-axis indicates the number of species in a model (note, a dashed line indicates that this is an estimated lower bound for the M. genitalium WCM, calculated as the total number of genes, proteins and metabolites from Karr et al [23]). The y-axis indicates the number of modelling formalisms used in combination (note, the coarse-grained E.coli model from Bertaux et al [22] was modelled using both deterministic and stochastic approaches individually).

References [1] [2] [3]

[4]

R. Losick, C. Desplan, Stochasticity and cell fate, Science. 320 (2008) 65–68. doi:10.1126/science.1147888. A. Eldar, M.B. Elowitz, Functional roles for noise in genetic circuits, Nature. 467 (2010) 167– 173. doi:10.1038/nature09326. M.D. Harton, E. Batchelor, Determining the Limitations and Benefits of Noise in Gene Regulation and Signal Transduction through Single Cell, Microscopy-Based Analysis, J. Mol. Biol. 429 (2017) 1143–1154. doi:10.1016/j.jmb.2017.03.007. N. Le Novère, Quantitative and logic modelling of molecular and gene networks, Nat. Rev. Genet. 16 (2015) 146–158. doi:10.1038/nrg3885.

[5] [6] [7]

[8] [9]

[10] [11]

[12]

[13]

[14]*

[15]

[16]

[17]

[18]

[19]

[20] [21]*

D. Machado, R.S. Costa, M. Rocha, E.C. Ferreira, B. Tidor, I. Rocha, Modeling formalisms in Systems Biology, AMB Express. 1 (2011) 45. doi:10.1186/2191-0855-1-45. J. Hasenauer, N. Jagiella, S. Hross, F.J. Theis, Data-Driven Modelling of Biological Multi-Scale Processes, J Coupled Syst Multiscale Dyn. 3 (2015) 101–121. doi:10.1166/jcsmd.2015.1069. J.D. Orth, T.M. Conrad, J. Na, J.A. Lerman, H. Nam, A.M. Feist, et al., A comprehensive genome-scale reconstruction of Escherichia coli metabolism--2011, Mol. Syst. Biol. 7 (2011) 535–535. doi:10.1038/msb.2011.65. E.J. O’Brien, J.M. Monk, B.Ø. Palsson, Using Genome-scale Models to Predict Biological Capabilities, Cell. 161 (2015) 971–987. doi:10.1016/j.cell.2015.05.019. E.J. O’Brien, R.L. Chang, B.Ø. Palsson, Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction, Mol. Syst. Biol. 9 (2013) 693– 693. doi:10.1038/msb.2013.52. O. Øyås, J. Stelling, Genome-scale metabolic networks in time and space, Current Opinion in Systems Biology. 8 (2018) 51–58. doi:10.1016/j.coisb.2017.12.003. K. Tummler, E. Klipp, The discrepancy between data for and expectations on metabolic models: How to match experiments and computational efforts to arrive at quantitative predictions? Current Opinion in Systems Biology. 8 (2018) 1–6. doi:10.1016/j.coisb.2017.11.003. J. Carrera, R. Estrela, J. Luo, N. Rai, A. Tsoukalas, I. Tagkopoulos, An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli, Mol. Syst. Biol. 10 (2014) 735–735. doi:10.15252/msb.20145108. A. Bordbar, D. McCloskey, D.C. Zielinski, N. Sonnenschein, N. Jamshidi, B.Ø. Palsson, Personalized Whole-Cell Kinetic Models of Metabolism for Discovery in Genomics and Pharmacodynamics, Cell Systems. 1 (2015) 283–292. doi:10.1016/j.cels.2015.10.003. M.W. Sneddon, J.R. Faeder, T. Emonet, Efficient modeling, simulation and coarse-graining of biological complexity with NFsim, Nat Meth. 8 (2011) 177–183. doi:10.1038/nmeth.1546. Example of a network-free stochastic simulation algorithm allowing simulation of a very large scale rule-based model of cell signaling. U. Muenzner, E. Klipp, M. Krantz, A comprehensive, mechanistically detailed, and executable model of the cell division cycle in Saccharomyces cerevisiae, Nature Communications 10 (2019) 1308. doi:10.1308/s41467-019-08903-w. J.C. Romers, S. Thieme, U. Muenzner, M. Krantz, A scalable method for parameter-free simulation and validation of mechanistic cellular signal transduction network models, bioRxiv (2018) 1–24. doi:10.1101/107235. M. Bouhaddou, A.M. Barrette, A.D. Stern, R.J. Koch, M.S. DiStefano, E.A. Riesel, et al., A mechanistic pan-cancer pathway model informed by multi-omics data interprets stochastic cell fate responses to drugs and mitogens, PLoS Comput Biol. 14 (2018) e1005985. doi:10.1371/journal.pcbi.1005985. F. Fröhlich, T. Kessler, D. Weindl, A. Shadrin, L. Schmiester, H. Hache, et al., Efficient Parameter Estimation Enables the Prediction of Drug Response Using a Mechanistic PanCancer Pathway Model, Cell Systems. 7 (2018) 567–579.e6. doi:10.1016/j.cels.2018.10.013. A.Y. Weisse, D.A. Oyarzún, V. Danos, P.S. Swain, Mechanistic links between cellular tradeoffs, gene expression, and growth, Proceedings of the National Academy of Sciences. 112 (2015) E1038–47. doi:10.1073/pnas.1416533112. V. Shahrezaei, S. Marguerat, Connecting growth with gene expression: of noise and numbers, Curr. Opin. Microbiol. 25 (2015) 127–135. doi:10.1016/j.mib.2015.05.012. P. Thomas, G. Terradot, V. Danos, A.Y. Weisse, Sources, propagation and consequences of stochasticity in cellular growth, Nature Communications. 9 (2018) 4528. doi:10.1038/s41467-018-06912-9. Describes a stochastic coarse-grained model of bacterial physiology, illustrating how abstract modelling can provide insight into the function of heterogeneous cellular processes.

[22] [23]**

[24]

[25]

[26]*

[27]*

[28]**

[29]

[30]

[31]*

[32] [33]**

[34] [35] [36]

F. Bertaux, J. Von Kügelgen, S. Marguerat, V. Shahrezaei, A bacterial size law revealed by a coarse-grained model of cell physiology, bioRxiv (2019) 1–21. doi:10.1101/078998. J.R. Karr, J.C. Sanghvi, D.N. Macklin, M.V. Gutschow, J.M. Jacobs, B. Bolival, et al., A wholecell computational model predicts phenotype from genotype, Cell. 150 (2012) 389–401. doi:10.1016/j.cell.2012.05.044. Describes the M. genitalium WCM, illustrating how a complex hybrid model can represent molecular detail at a cellular level. J.C. Sanghvi, S. Regot, S. Carrasco, J.R. Karr, M.V. Gutschow, B. Bolival, et al., Accelerated discovery via a whole-cell model, Nat Meth. 10 (2013) 1192–1195. doi:10.1038/nmeth.2724. B. Szigeti, Y.D. Roth, J.A.P. Sekar, A.P. Goldberg, S.C. Pochiraju, J.R. Karr, A blueprint for human whole-cell modeling, Current Opinion in Systems Biology. 7 (2018) 8–15. doi:10.1016/j.coisb.2017.10.005. A.P. Goldberg, B. Szigeti, Y.H. Chew, J.A. Sekar, Y.D. Roth, J.R. Karr, Emerging whole-cell modeling principles and methods, Current Opinion in Biotechnology. 51 (2018) 97–102. doi:10.1016/j.copbio.2017.12.013. Summarises the challenges posed by the development of larger scale detailed WCM and potential solutions to address these. A.C. Babtie, M.P.H. Stumpf, How to deal with parameters for whole-cell modelling, Journal of the Royal Society Interface. 14 (2017) 20170237–11. doi:10.1098/rsif.2017.0237. Discusses mathematical and statistical tools available for model construction and analysis, and their applicability to large-scale hybrid biological models. T.M. Earnest, J.A. Cole, Z. Luthey-Schulten, Simulating biological processes: stochastic physics from whole cells to colonies, Rep. Prog. Phys. 81 (2018) 052601–34. doi:10.1088/1361-6633/aaae2c. Reviews stochastic simulation methods for biological models, with a focus on spatiallyresolved models. D. Schnoerr, G. Sanguinetti, R. Grima, Approximation and inference methods for stochastic biochemical kinetics—a tutorial review, J. Phys. A: Math. Theor. 50 (2017) 093001–62. doi:10.1088/1751-8121/aa54d9. T.M. Earnest, J. Lai, K. Chen, M.J. Hallock, J.R. Williamson, Z. Luthey-Schulten, Toward a Whole-Cell Model of Ribosome Biogenesis: Kinetic Modeling of SSU Assembly, Biophysical Journal. 109 (2015) 1117–1135. doi:10.1016/j.bpj.2015.07.030. Z. Ghaemi, J. Peterson, M. Gruebele, Z. Luthey-Schulten, An In-Silico Mammalian WholeCell Model Reveals the Influence of Spatial Organization on RNA Splicing Efficiency, bioRxiv (2018) 1–10. doi:10.1101/435628. Describes a spatially-resolved stochastic model of a human cell used to study the influence of spatial heterogeneity on RNA splicing. D. Fange, J. Elf, Noise-Induced Min Phenotypes in E. coli, PLoS Comput Biol. 2 (2006) e80– 12. doi:10.1371/journal.pcbi.0020080. D.J. Warne, R.E. Baker, M.J. Simpson, Simulation and inference algorithms for stochastic biochemical reaction networks: from basic concepts to state-of-the-art, Journal of the Royal Society Interface. 16 (2019) 20180943–20. doi:10.1098/rsif.2018.0943. Reviews recent advances in simulation and parameter inference methods for stochastic biological models. T. Stuart, R. Satija, Integrative single-cell analysis, Nat. Rev. Genet. 20 (2019) 257–272. doi:10.1038/s41576-019-0093-7. A. Regev, S.A. Teichmann, E.S. Lander, I. Amit, C. Benoist, E. Birney, et al., The Human Cell Atlas, eLife. 6 (2017) 503. doi:10.7554/eLife.27041. C. Hutter, J.C. Zenklusen, The Cancer Genome Atlas: Creating Lasting Value beyond Its Data, Cell. 173 (2018) 283–285. doi:10.1016/j.cell.2018.03.042.

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45] [46] [47] [48]

[49]

J.N. Weinstein, E.A. Collisson, G.B. Mills, K.R.M. Shaw, B.A. Ozenberger, K. Ellrott, et al., The Cancer Genome Atlas Pan-Cancer analysis project, Nature. 45 (2013) 1113–1120. doi:10.1038/ng.2764. N. Swainston, K. Smallbone, H. Hefzi, P.D. Dobson, J. Brewer, M. Hanscho, et al., Recon 2.2: from reconstruction to model of human metabolism, Metabolomics. 12 (2016) 109. doi:10.1007/s11306-016-1051-4. I. Kuperstein, E. Bonnet, H.-A. Nguyen, D. Cohen, E. Viara, L. Grieco, et al., Atlas of Cancer Signalling Network: a systems biology resource for integrative analysis of cancer data with Google Maps, Oncogenesis. 4 (2015) e160–14. doi:10.1038/oncsis.2015.19. M. Kanehisa, M. Furumichi, M. Tanabe, Y. Sato, K. Morishima, KEGG: new perspectives on genomes, pathways, diseases and drugs, Nucleic Acids Research. 45 (2017) D353–D361. doi:10.1093/nar/gkw1092. D. Davidi, E. Noor, W. Liebermeister, A. Bar-Even, A. Flamholz, K. Tummler, et al., Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro kcatmeasurements, Proc. Natl. Acad. Sci. U.S.A. 113 (2016) 3401–3406. doi:10.1073/pnas.1514240113. D.R. Penas, P. González, J.A. Egea, R. Doallo, J.R. Banga, Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy, BMC Bioinformatics. 18 (2017) 52. doi:10.1186/s12859-016-1452-4. A.F. Villaverde, A. Barreiro, A. Papachristodoulou, Structural Identifiability of Dynamic Systems Biology Models, PLoS Comput Biol. 12 (2016) e1005153–22. doi:10.1371/journal.pcbi.1005153. O.-T. Chis, A.F. Villaverde, J.R. Banga, E. Balsa-Canto, On the relationship between sloppiness and identifiability, Mathematical Biosciences. 282 (2016) 147–161. doi:10.1016/j.mbs.2016.10.009. P.D.W. Kirk, A.C. Babtie, M.P.H. Stumpf, Systems biology (un)certainties, Science. 350 (2015) 386–388. doi:10.1126/science.aac9505. J.C. Romers, M. Krantz, rxncon 2.0: a language for executable molecular systems biology, bioRxiv (2017) 1–32. doi:10.1101/107136. J.R. Faeder, M.L. Blinov, W.S. Hlavacek, Rule-based modeling of biochemical systems with BioNetGen, Methods Mol. Biol. 500 (2009) 113–167. doi:10.1007/978-1-59745-525-1_5. L.A. Chylek, L.A. Harris, C.-S. Tung, J.R. Faeder, C.F. Lopez, W.S. Hlavacek, Rule-based modeling: a computational approach for studying biomolecular site dynamics in cell signaling systems, WIREs Syst Biol Med. 6 (2014) 13–36. doi:10.1002/wsbm.1245. R.E. Baker, J.-M. Peña, J. Jayamohan, A. Jérusalem, Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biol. Lett. 14 (2018) 20170660–4. doi:10.1098/rsbl.2017.0660.

Boolean stochastic

4

number of model formalisms

5

Karr 2012

network-free stochastic spatial stochastic

3

ODE

2 Bertaux 2019

Earnest 2015 Thomas 2018

Muenzner 2018

Fröhlich 2018

Sneddon 2011

0

1

FBA

Bouhaddou 2018

0

1

2

3

log(species)

4

5