Proceedings, Proceedings, 7th 7th IFAC IFAC Conference Conference on on Foundations of Systems Biology in inon Engineering Proceedings,of 7th IFAC Conference Conference on Proceedings, 7th IFAC Foundations Systems Biology Engineering Available online at www.sciencedirect.com Proceedings, 7th IFAC August Conference on Chicago, USA, 5-8, Foundations of Systems Biology Engineering Foundations of Systems Biology in2018 Engineering Chicago, Illinois, Illinois, USA, August 5-8,in 2018 Foundations of Systems Biology in Engineering Chicago, Illinois, USA, August 5-8, 2018 Chicago, Illinois, USA, August 5-8, 2018 Chicago, Illinois, USA, August 5-8, 2018
ScienceDirect
IFAC PapersOnLine 51-19 (2018) 124–125
Model Model Uncertainty Uncertainty Analysis Analysis for for Metabolic Metabolic Model Uncertainty Analysis for Metabolic Network Inference: A Case in Model Uncertainty Analysis forStudy Metabolic Network Inference: A Case Study Network Inference: A Case Study in in Bayesian Model Network Inference: Case Study in Bayesian ModelAAveraging Averaging Bayesian Model Bayesian Model Averaging Averaging Axel Theorell ∗∗ Katharina N¨ oh ∗∗
Axel Theorell ∗∗ Katharina N¨ oh ∗∗ Axel Theorell Theorell Katharina Katharina N¨ N¨ oh h Axel o ∗ Axel Theorell Katharina N¨ oh ∗ ∗ of Bioand Geosciences, IBG-1, Forschungszentrum J¨ u ∗ Institute ulich lich ∗ Institute of Bio- and Geosciences, IBG-1, Forschungszentrum J¨ ∗ Institute of Bioand Geosciences, IBG-1, Forschungszentrum J¨ ulich lich GmbH, Wilhelm-Johnen-Strasse, 52428 J¨ u lich, Germany (e-mail: Institute of Bioand Geosciences, IBG-1, Forschungszentrum J¨ u u lich, Germany (e-mail: ∗ GmbH, Wilhelm-Johnen-Strasse, 52428 J¨ Institute of Bio- and Geosciences, IBG-1,J¨ Forschungszentrum J¨ ulich GmbH, 52428 u lich, Germany (e-mail:
[email protected]) GmbH, Wilhelm-Johnen-Strasse, Wilhelm-Johnen-Strasse, 52428 J¨ u lich, Germany (e-mail:
[email protected]) GmbH, Wilhelm-Johnen-Strasse, 52428 J¨ ulich, Germany (e-mail:
[email protected])
[email protected])
[email protected]) Abstract: A A distinguishing distinguishing feature feature of of systems systems biology biology is is the the interrogation interrogation of of models models as as aa means means Abstract: Abstract: A feature of biology interrogation of aa means of making making predictions predictions or generating generating deeper understanding of the the systems under under study.as However, Abstract: A distinguishing distinguishing feature deeper of systems systems biology is is the the interrogation of models models asHowever, means of or understanding of systems study. Abstract: A distinguishing feature of systems biology is the interrogation of models asHowever, a means of making predictions or set generating deeper understanding of the systems under study. However, when usingpredictions given data to address specific question, unique and provably provably correct model of making or generating deeper understanding of the systems under study. when using aa given data set to address aa specific question, aa unique and correct model of making predictions or generating deeper understanding of the systems under study. However, when using a atogiven given data set to to address address aInstead, specificaquestion, question, a unique unique and provablyformulations correct model model formulation apply is rarely known. large selection of alternative of when using data set a specific a and provably correct formulation apply is rarely known. aInstead, aquestion, large selection of and alternative of when using ato data setcombinatorial to address specific a unique provablyformulations correct model formulation to apply is known. aa large of alternative formulations of varying scopes ensues from composition of selection entities. In this scenario, computational formulation togiven apply is rarely rarely known. Instead, Instead, large selection ofthis alternative formulations of varying scopes ensues from combinatorial composition of entities. In scenario, computational formulation applyusfrom istorarely known. Instead, a large of alternative formulations of varying ensues combinatorial composition of entities. this scenario, computational methodsscopes thattoallow allow make statistically valid inferences andIn predictions, while accounting varying scopes ensues combinatorial composition of selection entities. In this scenario, computational while accounting methods that usfrom to make statistically valid inferences and predictions, varying scopes ensues combinatorial composition of entities. this scenario, computational methods that allow allow usfrom to make make statistically valid inferences andInpredictions, predictions, while accounting for the uncertainty in model formulation are desired. methods that us to statistically valid inferences and while accounting for the uncertainty inusmodel formulation are valid desired. methods that allow to make statistically inferences and predictions, while uncertainty accounting for uncertainty model formulation are We the investigate into in Bayesian Model Averaging (BMA), which accounts for model model for the uncertainty in model formulation are desired. desired. We investigate into Bayesian Model Averaging (BMA), which accounts for uncertainty for the uncertainty in model formulation are desired. We investigate into Bayesian Model Averaging (BMA), which accounts for model uncertainty by considering an ensemble of candidate models instead of a single model instance. To show We investigate an into Bayesian Averaging accounts model To uncertainty by considering ensemble of Model candidate models (BMA), instead ofwhich a single model for instance. show the the We investigate an into Bayesian Averaging (BMA), which accounts for model uncertainty by considering an ensemble of Model candidate models instead of a single single model instance. To show the the computational tractability of BMA, we perform model uncertainty analysis for a realistically by considering ensemble of candidate models instead of a model instance. To show computational tractability of BMA, we perform model uncertainty analysis for a realistically by considering an ensemble ofthe candidate instead of a analysis, single model instance. show the computational tractability of BMA, we models perform modelflux uncertainty analysis foranaa To realistically sized reaction network network fromof domain of metabolic metabolic featuring ensemble of computational tractability BMA, we perform model uncertainty for realistically sized reaction from the domain of flux analysis, analysis featuring ana ensemble of computational tractability of BMA, we perform model uncertainty analysis for realistically sized reaction network from the domain of metabolic flux analysis, featuring an ensemble of millions of models. This is made possible using a Markov Chain Monte Carlo (MCMC) method, sized reaction network from the domain of metabolic flux analysis, featuring an ensemble of millions of models. This from is made using a Markovflux Chain Montefeaturing Carlo (MCMC) method, sized reaction network thepossible domain of metabolic analysis, ensemble of millions of models. This possible using aa Markov Chain Monte method, tailored to handle parameter and model structure uncertainty simultaneously. Toan investigate the millions of handle models. This is is made made possible using Markov Chain Monte Carlo Carlo (MCMC) (MCMC) method, tailored to parameter and model structure uncertainty simultaneously. To investigate the millions of models. This is made possible using a Markov Chain Monte Carlo (MCMC) method, tailored to handle handle parameter andthe model structureproblem, uncertainty simultaneously. To investigate investigate the computational burden of solving multi-model a super-model is created that includes tailored to parameter and model structure uncertainty simultaneously. To the computational burden of solving the multi-model problem, a super-model is created that includes tailored to handle model structure uncertainty simultaneously. To investigate the computational burden of solving solving the multi-model problem, super-model is created created that includes the reactions reactions ofburden allparameter models in and thethe multi-model problem. The computational burden of the the multicomputational of multi-model problem, aa super-model is that includes the of all models in the multi-model problem. The computational burden of multicomputational burden of solving the multi-model problem, a super-model is created that includes the reactions of all models in the multi-model problem. The computational burden of the multimodel problem isallcompared compared tothe that of conventional conventional MCMC inference on the theburden single super-model. super-model. the reactions of is models into multi-model problem. The inference computational of the multimodel problem that of MCMC on single the of is models multi-model problem. The inference computational of the multimodel problem compared to that of conventional MCMC on single super-model. Thereactions comparison yields thein surprising that the multi-model problem is computationally model problem isall compared tothe that of insight conventional MCMC inference on the theburden single super-model. insight that the multi-model problem is computationally The comparison yields the surprising model problem than isyields compared to that of insight conventional MCMC inferenceproblem on the single super-model. The comparison yields the surprising insight that the the multi-model problem is computationally computationally less expensive the single super-model problem. Furthermore, we demonstrate, with the the The comparison the surprising that multi-model is less expensive than thethe single super-model problem. Furthermore, we demonstrate, with The comparison yields surprising insight that the multi-model problem is computationally less expensive than theBMA single super-model problem. Furthermore, we demonstrate, demonstrate, with the the example at hand, that yields valid structural network inferences. less expensive than the single super-model problem. Furthermore, we with example at hand, that valid structural network inferences. less expensive than theBMA singleyields super-model problem. Furthermore, we demonstrate, with the example at that BMA yields valid network inferences. example at hand, hand, that BMA yields valid structural structural network inferences. © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. example at hand, that BMA yields valid structural network inferences. Keywords: Modeling; Modeling; Uncertainty Uncertainty Analysis; Analysis; Bayesian Bayesian Model Model Averaging; Averaging; MCMC; MCMC; Metabolic Metabolic Keywords: Keywords: Modeling; Uncertainty Uncertainty Analysis; Analysis; Bayesian Bayesian Model Model Averaging; Averaging; MCMC; MCMC; Metabolic Metabolic Flux Analysis Keywords: Modeling; Flux Analysis Keywords: Modeling; Uncertainty Analysis; Bayesian Model Averaging; MCMC; Metabolic Flux Analysis Analysis Flux Flux Analysis 1. INTRODUCTION INTRODUCTION Averaging Averaging (BMA) (BMA) is is aa powerful powerful quantification quantification method method for for 1. 1. Averaging (BMA) is isuncertainty a powerful powerful with quantification method for for model formulation strong foundation in 1. INTRODUCTION INTRODUCTION Averaging (BMA) a quantification method in model formulation uncertainty with strong foundation 1. INTRODUCTION Averaging (BMA) is a powerful quantification method for formulation uncertainty with strong foundation in Fundamental to to systems systems biology is is the the use use of of models models to to model probability theory (Hoeting et al., 1999). BMA estimates model formulation uncertainty with strong foundation in Fundamental biology probability theory (Hoeting et al., 1999). BMA estimates model formulation uncertainty with strong foundation in Fundamental to systems biology is the use of models to probability theory (Hoeting et al., 1999). BMA estimates infer parameters and to make predictions. Inferences from the parameter uncertainty that originates from both meaFundamental to and systems biology is the use of models to probability theory (Hoetingthat et al., 1999). BMA estimates the parameter uncertainty originates from both meainfer parameters to make predictions. Inferences from Fundamental to and systems biology isused the to use of models to the probability theory (Hoeting et al., 1999). BMA estimates infer parameters to make make predictions. Inferences from the parameter uncertainty that originates from both measuch investigations are routinely guide, for exsurement errors and uncertainty in the model formulation, infer parameters and to predictions. Inferences from parameter uncertainty that originates from both meaerrorsuncertainty and uncertainty in the model formulation, such investigations are routinely used toInferences guide, forfrom ex- surement infer parameters to make predictions. parameter that originates from both measuch investigations are routinely used guide, surement errorsinference and uncertainty uncertainty in the the model model formulation, ample, metabolicand engineering strategies (Stephanopoulos by performing for all candidate models and such investigations are routinely used to to guide, for for exex- the surement errors and in formulation, by performing inference for all candidate models and avavample, metabolic engineering strategies (Stephanopoulos such investigations are routinely used to guide, for exsurement errors and uncertainty in the model formulation, ample, metabolic engineering strategies (Stephanopoulos by performing inference for all candidate models and et al., al., 1998). 1998). Clearly, in these these processes, the choice choice of of by eraging over the inferences. In principle, BMA could be ample, metabolic engineering strategies (Stephanopoulos performing inference for all candidate models and avavet Clearly, in processes, the eraging over the inferences. In principle, BMA could be ample, metabolic engineering strategies (Stephanopoulos by performing inference for all candidate models and avet 1998). in processes, the of eraging over inferences. In principle, BMA could be theal., model is of ofClearly, crucial importance. Nevertheless, systems in reaction networks for testing network features, for et al., 1998). Clearly, in these these processes, the choice choice of used eraging over the the inferences. In principle, BMA could be used in reaction networks for testing network features, for the model is crucial importance. Nevertheless, systems et al., 1998). Clearly, in these processes, the choice of eraging over the inferences. In principle, BMA could be the model is of crucial importance. Nevertheless, systems used in reaction networks for testing network features, for biologists often consider the models as given, without example, whether a certain pathway is metabolically active the model is of crucial importance. Nevertheless, systems used in reaction networks for testing network features, for example, whether a certain pathway is metabolically active biologists often consider the models as given, without the model is of crucial importance. Nevertheless, systems used in reaction networks for testing network features, for biologists often consider the models as given, without example, whether certain pathway is metabolically metabolically active explicitly considering considering the possibility possibility of as alternative formu- example, or not, assuming that the reaction biologists often consider the modelsof given, without whether aa certain pathway is active explicitly the alternative formuor not, without without assuming that the precise precise reaction network network biologists often consider the models as given, without example, whether aacertain pathway is the metabolically active explicitly considering the possibility of alternative formuor not, without assuming that the precise reaction network lations. Consequently, neglecting that wrong model fortopology is known priori. However, statistical strinexplicitly considering the possibility ofa alternative formuor not, without assuming that the precise reaction network lations. Consequently, neglecting that a wrong model fortopology is known a priori. However, the reaction statistical strinexplicitly considering the possibility ofaajudgments alternative formuor not, without assuming that the precise network lations. Consequently, neglecting that wrong model fortopology is known a priori. However, the statistical strinmulation might be used, risks biasing towards gency of BMA comes at the price of high computational lations. Consequently, neglecting that wrong model fortopology is known a priori. However, the statistical strinmulation might be used, risks biasing judgments towards gency of BMA comes at the price of high computational lations. Consequently, neglecting that a wrong model fortopology is known a priori. However, the statistical strinmulation might in be used, risks biasing judgments towards gency of BMA comes at the price of high computational over-confidence the results. effort: BMA problems are computationally involved by mulation might be used, risks biasing judgments towards gency of BMA comes at the price of high computational effort: BMA problems are computationally involved by over-confidence in the results. mulation might be used, risks biasing judgments towards gency of BMA comes at the price of high computational over-confidence in the results. effort: BMA problems are computationally involved by nature, and evade analytic solutions for all, but trivial over-confidence in the results. effort: BMA problems are computationally involved by nature, and evade analytic solutions for all, but trivial Metabolic reaction networks form a common model type over-confidence in the results.form a common model type nature, effort: BMA problems aretocomputationally involved by Metabolic reaction networks and evade analytic solutions for all, but trivial cases. This has given rise approximation methods for nature, and evade analytic solutions for all, but trivial cases. This has given rise to approximation methods for Metabolic networks aa common model in systems systems reaction biology. Uncertainty Uncertainty in the the formulation of type such nature, Metabolic reaction networks form form common model type and evade analytic solutions for all, but trivial in biology. in formulation of such cases. This has given rise to approximation methods for BMA problems, in particular Markov Chain Monte Carlo cases. problems, This has given rise to Markov approximation methods for Metabolic reaction networks form a ofcommon model in particular Chain Monte Carlo in systems biology. in formulation of type such BMA networks leads to aaUncertainty large number number candidate models, in systemsleads biology. Uncertainty in the the of such cases. Thismethods. has given rise to Markov approximation methods for networks to large of formulation candidate models, BMA problems, in particular particular Markov Chain Monte Monte Carlo (MCMC) BMA problems, in Chain Carlo in systems biology. Uncertainty in the formulation of such (MCMC) methods. networks leads to a large number of candidate models, since the number of model topologies increases combinanetworks leads to ofa model large number of increases candidatecombinamodels, BMA problems, in particular Markov Chain Monte Carlo since the number topologies (MCMC) methods. methods. networks leads large number of candidate models, since the number of topologies combinatorially with theto number of reactions reactions toincreases be considered. considered. For (MCMC) The aim this since thewith number ofa model model topologies increases combina(MCMC) The aim of ofmethods. this work work is is to to propose propose aa method method that that makes makes torially the number of to be For since the number of model topologies increases combinatorially with the number of reactions to be considered. For The aim of this work is to propose a method that makes makes the quantification of the uncertainty in such scenarios, a BMA in metabolic reaction networks computationally torially with the number of reactions to be considered. For The aim of this work is to propose a method that BMA in metabolic reaction networks computationally the quantification of the uncertainty in such scenarios, a torially with the number reactions to considered. For aim ofmetabolic thisa work is to propose a method that makesa the quantification of the the of uncertainty inbe such scenarios, a The BMA in metabolic reaction networks computationally number of computational techniques have been suggested tractable. As means to this end, we implemented the quantification of uncertainty in such scenarios, a BMA in reaction networks computationally As a means to thisnetworks end, we computationally implemented a number of computational techniques have been suggesteda tractable. the quantification the uncertainty in Kirk such scenarios, BMA metabolic reaction number computational techniques have been suggested As means to this end, a (Conti etof al., 2003;ofBabtie Babtie et al., al., 2014; 2014; et al., 2013; 2013; tractable. tailoredinMCMC MCMC technique called Reversible Jump Markov Markov number ofal., computational techniques have been suggested tractable. As aa technique means tocalled this Reversible end, we we implemented implemented a tailored Jump (Conti et 2003; et Kirk et al., number of computational techniques have been suggested tractable. As a means to this end, we implemented a (Conti et al., 2003; Babtie et al., 2014; Kirk et al., 2013; tailored MCMC technique called Reversible Jump Markov Sunn˚ aker et al., 2013). Among these, Bayesian Model Chain Monte Carlo (RJMCMC) (Green, 1995) for model (Conti et al., 2003; Babtie et al., 2014; Kirk et al., 2013; tailoredMonte MCMC technique called Reversible Jump Markov Chain Carlo (RJMCMC) (Green, 1995) for model Sunn˚ aker et al., 2013). Among these, Bayesian Model (Conti et al., 2003;2013). BabtieAmong et al., these, 2014; Kirk et al.,Model 2013; tailored MCMC technique called Reversible Jump Markov Sunn˚ aker et al., Bayesian Chain Monte Carlo (RJMCMC) (Green, 1995) for model uncertainty analysis. Sunn˚ aker et al., 2013). Among these, Bayesian Model Chain Monte Carlo (RJMCMC) (Green, 1995) for model uncertainty AT was the Grant DYNAMICS (Grant No. AT aker was supported supported by the ERA-IB ERA-IB Grant DYNAMICS (Grant No. Sunn˚ et al., by 2013). Among these, Bayesian Model Chain Monteanalysis. Carlo (RJMCMC) (Green, 1995) for model uncertainty analysis. uncertainty analysis. ERA-IB-14-81). AT AT was was supported supported by by the the ERA-IB ERA-IB Grant Grant DYNAMICS DYNAMICS (Grant (Grant No. No. ERA-IB-14-81). uncertainty analysis. AT was supported by the ERA-IB Grant DYNAMICS (Grant No. ERA-IB-14-81).
ERA-IB-14-81). ERA-IB-14-81). 2405-8963 © IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Copyright © 2018, 2018 IFAC 124 Copyright 2018 responsibility IFAC 124Control. Peer review©under of International Federation of Automatic Copyright © 2018 IFAC 124 Copyright © 2018 IFAC 124 10.1016/j.ifacol.2018.09.010 Copyright © 2018 IFAC 124
IFAC FOSBE 2018 Chicago, Illinois, USA, August 5-8, 2018
Axel Theorell et al. / IFAC PapersOnLine 51-19 (2018) 124–125
2. BAYESIAN MODEL AVERAGING BMA is a technique for drawing inferences about a quantity of interest ∆, in view of the data D and the possible models {Mi }N i=1 . In our application, ∆ is a binary variable, indicating whether a certain metabolic reaction is active or not. The posterior probability of ∆, p(∆|D), is then expressed by: N p(∆|Mi , D) · p(Mi |D), (1) p(∆|D) = i=1
where p(Mi |D) is the probability of the ith model (Hoeting et al., 1999). When modeling reaction networks, the network Mi depends on a vector of unknown parameter values θi , for example, the in vivo reaction rates. To apply Eq. (1), these parameters are incorporated by marginalization: N p(∆|D) = p(∆|Mi , θi , D) · p(Mi , θi |D)dθi . (2) i=1 θ
125
could quantify how many samples were needed to achieve results that were reproducible over consecutive algorithm executions with random starting points. The results were reproducible after 104 samples, containing approximately 103 of the 106 models. To put this into perspective, the RJMCMC convergence rate was compared to the convergence rate of a conventional MCMC sampler, that performs parameter inference on the super-model that contained all the pathways of all the models in the BMA model ensemble. Interestingly, the RJMCMC sampler, addressing the combined problem of measurement uncertainty and model formulation uncertainty, showed faster convergence than the standard MCMC approach, that only addressed measurement uncertainty within the supermodel. We attribute the increased convergence speed to the fact that MCMC convergence rates slow down when the dimensionality of the parameter space increases and the models considered by RJMCMC had lower average dimensionality (21.5 dimensions) than the super-model (34 dimensions).
i
Due to its inherent complexity, Eq. (2) is solved approximately, for instance, by means of sampling based methods such as MCMC (Brooks et al., 2011). 3. REVERSIBLE JUMP MARKOV CHAIN MONTE CARLO According to Eq. (2), BMA requires knowledge of the simultaneous probability distribution of models and parameter values, p(Mi , θi |D). Attempts to approximate p(Mi , θi |D) using standard MCMC fail, due to the nonconsistent dimensionality of the parameter space, since θi may change dimensionality when switching between different models. RJMCMC is a tailored MCMC method, specifically designed to address those problems where the dimensionality of the parameter space may change (Green, 1995). In RJMCMC the Markov Chain states consist of a model with an associated vector of parameter values. In this work, an RJMCMC sampler was formulated which uses a mixture of two Metropolis-Hastings transition densities: One transition that updates parameters within the current model and one transition that updates the model and the parameters simultaneously. Since no suitable implementations of RJMCMC were at hand, the sampler was implemented in C++. 4. RESULTS The aim of this work is to propose a computational method that makes BMA computationally tractable for application to metabolic reaction networks. To this end, we made a case study to investigate the tractability of RJMCMC for an example of realistic complexity. Specifically, we addressed the metabolic flux inference problem in 13 C Metabolic Flux Analysis (Wiechert, 2001) with an E. coli model from literature (Crown et al., 2015). Starting from a super-model, over 106 unique metabolic network models, each containing a unique subset of reaction reversibilities out of the full set of reaction reversibilities of the supermodel, were considered. By calculating the potential scale reduction for repeated runs of the RJMCMC sampler (Gilks et al., 1995), we 125
5. CONCLUSIONS BMA, computed using RJMCMC, enables model inference, without assuming that the correct model formulation is known. Due to its wide applicability and computational tractability, it holds the potential to be an engine for generating biological knowledge. REFERENCES Babtie, A.C., Kirk, P., and Stumpf, M.P. (2014). Topological sensitivity analysis for systems biology. Proceedings of the National Academy of Sciences, 111(52), 18507– 18512. Brooks, S., Gelman, A., Jones, G., and Meng, X.L. (2011). Handbook of Markov Chain Monte Carlo. CRC Press. Conti, D.V., Cortessis, V., Molitor, J., and Thomas, D.C. (2003). Bayesian modeling of complex metabolic pathways. Human Heredity, 56(1-3), 83–93. Crown, S.B., Long, C.P., and Antoniewicz, M.R. (2015). Integrated 13 C-metabolic flux analysis of 14 parallel labeling experiments in Escherichia coli. Metabolic Engineering, 28, 151–158. Gilks, W.R., Richardson, S., and Spiegelhalter, D. (1995). Markov chain Monte Carlo in practice. CRC Press. Green, P.J. (1995). Reversible Jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732. Hoeting, J.A., Madigan, D., Raftery, A.E., and Volinsky, C.T. (1999). Bayesian model averaging: a tutorial. Statistical Science, 382–401. Kirk, P., Thorne, T., and Stumpf, M.P. (2013). Model selection in systems and synthetic biology. Current Opinion in Biotechnology, 24(4), 767–774. Stephanopoulos, G.N., Aristidou, A.A., and Nielsen, J. (1998). Metabolic Engineering: Principles and Methodologies. Academic Press. Sunn˚ aker, M., Zamora-Sillero, E., Dechant, R., Ludwig, C., Busetto, A.G., Wagner, A., and Stelling, J. (2013). Automatic generation of predictive dynamic models reveals nuclear phosphorylation as the key msn2 control mechanism. Science Signaling, 6(277). 13 Wiechert, W. (2001). C metabolic flux analysis. Metabolic Engineering, 3(3), 195–206.