Journal of Biotechnology 119 (2005) 87–92
Short communication
Development of a software tool for in silico simulation of Escherichia coli using a visual programming environment夽 Sung Gun Lee a , Cheol Min Kim b,c,∗∗ , Kyu Suk Hwang a,∗ a
c
Department of Chemical Engineering, College of Engineering, Pusan National University, 30 Jangjeon-dong, Geumjeong-gu, Busan 609-735, South Korea b Department of Biochemistry, College of Medicine, Pusan National University, 10 Ami-Dong1-Ga, Seo-Gu, Busan 602-739, South Korea Department of Biomedical Informatics and Genomic Medicine, Medical Research Institute, Pusan National University, 10 Ami-Dong1-Ga, Seo-Gu, Busan 602-739, South Korea
Received 2 December 2004; received in revised form 31 March 2005; accepted 6 April 2005
Abstract This study describes the development of a software tool, EcoSim, to assist users in implementing quantitative in silico simulation easily. It consists of four parts: extracellular environment and constraints setting mode, table for optimal metabolic flux distribution and chart for changes of substrate concentration, dynamic flux distribution viewer and dynamic hierarchical regulatory network viewer. Representation of a hierarchical regulatory network was constructed with defined modeling symbols and weight in the central Escherichia coli metabolism. All programming procedures for EcoSim were accomplished in a visual programming environment (LabVIEW). To illustrate quantitative in silico simulation with EcoSim, this program was performed on E. coli using glucose and acetate as carbon sources. The simulation results were in agreement with the experimental data obtained from the literature. EcoSim can be used to assist biologists and engineers in predicting and interpreting dynamic behaviors of E. coli under a variety of environmental conditions. © 2005 Elsevier B.V. All rights reserved. Keywords: EcoSim; LabVIEW; Hierarchical regulatory network; In silico simulation
1. Introduction 夽 A manual for simulating behaviors of E. coli under two carbon sources, glucose and acetate, is on the web: http://home.pusan. ac.kr/∼ecosim. ∗ Corresponding author. Tel.: +82 51 510 2400; fax: +82 51 512 8563. ∗∗ Co-corresponding author. E-mail address:
[email protected] (K.S. Hwang).
0168-1656/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.jbiotec.2005.04.013
Predicting cellular behavior under various environmental conditions and understanding how genes work at the genomic scale are essential for biotechnological research and application. Accomplishment of these goals makes the need to construct biological model systems.
88
S.G. Lee et al. / Journal of Biotechnology 119 (2005) 87–92
The recent flood of experimental data on gene function and regulatory interactions has provided the knowledge required for the construction of in silico models through high-throughput computational analysis (Covert et al., 2001; Mendoza and Alvarez-Buylla, 1998; Reed and Palsson, 2003; Oh et al., 2002). Among in silico modeling approaches, the constraints-based approach has proven effective in the construction of in silico models based on metabolic pathways, the stoichiometry of metabolic reactions, and mass balances around the metabolites under steady-state assumption (Covert and Palsson, 2003; Lee and Papoutsakis, 1999). Recently, Covert and co-workers have attempted to incorporate regulatory constraints in this constraints-based approach as the constraints have a significant effect on the behavior of an organism. These regulatory constraints serve as temporary flux constraints that eliminate two or more inconsistent regulatory events to occur simultaneously under defined environmental conditions (Covert et al., 2001; Covert and Palsson, 2002, 2003; Kauffman et al., 2003; Reed and Palsson, 2003). These transcriptional regulatory constraints are the logic equations represented by the production rule (If–then rules) and Boolean logic. This Boolean rulebased system is capable of describing well the causal relationship among gene, substrate and product and converting the on/off state of gene expression under various environmental conditions into rule. However, the constraints-based model with Boolean logic did not represent how the control mechanism of gene expression happens due to a property of the Boolean rulebased system, which it is difficult to track the application of rules (Winston, 1992). Moreover, in order to calculate optimal metabolic flux distribution, which maximizes the growth flux and predicts the behavior of a cell system, a linear programming package (LINDO) and a spreadsheet package (EXCEL) have been used simultaneously. These two software packages have limits in the visual representation of simulation results, and it can be difficult for a user to look at the effects of changing inputs to the models. For these reasons, here, we describe the development of an integrated software tool, EcoSim, satisfying demands mentioned above in the constraintsbased metabolic models. Hierarchical regulatory network modeled by defined symbols was constructed in EcoSim for representations of gene control mechanism
responding to extracellular environmental change. All programming procedures were accomplished in a visual programming environment (LabVIEW), including developing a graphical user interface to assist users in implementing quantitative in silico simulations easily. 2. Material and methods 2.1. Visual programming EcoSim (Version 1.0) was composed in LabVIEW 7.1 (National Instruments Ltd.) and was run on IBM computer with 1 GByte RAM. Sub.VI, Linear Programming Simplex Method Sub.VI in LabVIEW, was used to calculate optimal flux distribution and maximal growth rate. Maximizing growth as objective functions was used in this study. Linear optimization was carried out with the maximal growth objective for calculating an optimal metabolic flux distribution. The optimal flux distribution was obtained from the flux distribution that satisfied the condition of maximizing the growth flux. 2.2. Fundamental modeling symbols and weight Modeling symbols are units for modeling hierarchical regulatory networks representing regulatory mechanisms of genes with environmental change. Symbols are divided into biological symbols, which include operator, structural gene, promoter, regulatory protein, and effector; and non-biological symbols which include and gate, or gate and weight distributor (Table 1). Weight represents a degree of connectivity between symbols and is used to determine active/inactive states of symbols. It can be effectively used in conditions that cannot determine an active/inactive state of symbol with a binary system (Fig. 1). 2.3. Central metabolic network of Escherichia coli In this study, the previously studied central metabolic network of E. coli (Schilling et al., 2001) was taken and expanded. This network is constructed with reactions related to the central metabolic network. We added 13 regulatory proteins in EcoSim, which worked in aerobic/anaerobic conditions, glucose, lactose, succinate, acetate, ethanol and tryptophan metabolism.
S.G. Lee et al. / Journal of Biotechnology 119 (2005) 87–92
89
Table 1 Fundamental modeling symbols Function
Symbol
Biological symbols Operator Structural gene Promoter (RNA polyrnerase) Regulatory protein (or transcription factor) Effector (inducer, repressor) Gene transcription Negative control Positive control Non-biological symbols Weight distributor And gate Not gate Or gate Symbol state active/inactive The symbols were combined to represent operon as the basic structure: regulatory protein, operator, promoter, and structural genes. These operons were included in hierarchical regulatory networks.
Fig. 1. Symbols change to weight. The state of a symbol was active or inactive, which was determined by the propagation of weight and the Boolean symbols for weight through an arc. If weight was over 0 or below 0, the state of a symbol was represented as an active state or an inactive state. (a) Inactive condition of symbol. (b) Active condition of symbol. (c) Weight application to the binary system.
determined states of genes acted as constraints in FBA. If gene was down-regulated, the corresponding reactions were constrained to “0”. Delay time, 0.5 h, was added when the variation in a gene state was from down-regulation to up-regulation. After the regulatory constraints were applied to the system, the optimal flux distribution and maximal growth rate were obtained from FBA. The determined maximal growth rate was used for the standard differential equations to predict the time profiles of consumed substrate. An iterative algorithm was used to predict concentration for the next step.
2.4. Dynamic simulation To quantitatively predict cellular behavior, flux balance analysis (FBA), a standard differential equation for predicting substrate concentration, and an iterative algorithm were utilized (Varma and Palsson, 1994a,b). Dynamic simulation was carried out under the steadystate assumption that fluxes are constant in an interval. When cell was in given environmental conditions, the up- or down-regulation of every regulated gene was determined by hierarchical regulatory network. These
3. Results and discussion 3.1. Program overview 3.1.1. Extracellular environment and constraints setting mode The extracellular environment setting mode is to specify a condition where cells grow. It has eight checkboxes. Each checkbox indicates the presence of
90
S.G. Lee et al. / Journal of Biotechnology 119 (2005) 87–92
aerobic/anaerobic conditions, glucose, lactose, succinate, acetate and tryptophan. The aerobic condition was set for 15 mM/g DCW h; the anaerobic condition was 0 mM/g DCW h (Varma and Palsson, 1994a). These eight environmental setting modes are linked with the dynamic hierarchical regulatory network. Therefore, when a user checks extracellular condition in the environment setting mode, a initial state of genes on/off for simulation is defined. The maximal uptake rate and initial cell density of E. coli have to be specified to calculate an optimal metabolic flux distribution. User only writes values in the boxes for constraints. Initial conditions for simulation are completed through above specifications. 3.1.2. Table for optimal flux distribution and chart for substrate concentration change EcoSim provides a table showing optimal flux distribution by optimizing a desired physiological property (maximal growth rate). Distribution changes by global control mode can directly be compared through the table and concentration changes of substrate with time are displayed on the chart. This makes it possible to understand the metabolic and physiological changes of a cell under various environmental conditions. Moreover, it provides a chart (Fig. 2(c)) showing changes of substrate concentration by repeatedly calculating a standard differential equation (Varma and Palsson, 1994a). 3.1.3. Dynamic flux distribution viewer EcoSim provides a dynamic flux distribution viewer to display metabolic reaction pathways. This viewer automatically shows metabolic pathways with flux distribution results, which can change under various environmental conditions. 3.1.4. Dynamic hierarchical regulatory network viewer EcoSim provides a dynamic hierarchical regulatory network viewer to display how regulatory proteins control genes with environmental change. This network takes the shape of a hierarchical structure by stimulus level, modulon level, regulon/operon level, gene level and metabolic pathway level (Fig. 2(a)). Stimulus, located at the highest ranking, acts as a detector that senses the extracellular environment. If O2 , glucose and lactose are present in the medium, the
stimulus responding to them becomes on. If the stimulus is on, biochemical reactions at the pathway level, the lowest ranking, are turned on by a gene control mechanism. The reactions are specified as the initial value for simulation. Finally, on/off time of each gene is displayed by simulation. Users can obtain time series data of genes as pressing arrow buttons in the left of index display. This index display shows on/off time of a temporal gene expression with environmental change. 3.2. Prediction of central metabolic gene expression in acetate compared with glucose EcoSim was performed to predict up/downregulation of the central metabolic genes in the medium with both glucose and acetate. As parameters for the simulation, initial concentrations (biomass: 0.0015 g/l, estimated, glucose: 10.4 mM, estimated, acetate: 4.0 mM) and uptake rate constraints (glucose: 10.5 mM/g DCW h, estimated, acetate: 2.0 mM/g DCW h, estimated, oxygen: 15 mM/g DCW h) were obtained from the literature (Varma and Palsson, 1994a). All procedures for implementing quantitative in silico simulation of E. coli were presented on the web (http://home.pusan.ac.kr/ ∼ecosim). The simulation results for 25 genes related to acetate metabolism in a hierarchical regulatory network showed that aceB, aceA, aceK, acnA, acs, fumA, icdA, mdh, ppsA, sdhA, sdhB, sdhC, adhD, sucA and sucB were up-regulated, and adhE, aceE, aceF, crr, fumB, pgk, ptsG, ptsH, ptsI and pykA were down-regulated. Up-regulated genes were involved in acetate uptake, the glyoxylate shunt, TCA cycle and gluconeogenesis. On the other hand, down-regulated genes were related to glucose uptake and glycolytic genes. When the up/down-regulated genes from EcoSim were compared with the fold-change of central metabolic gene expression from Oh et al. (2002), all genes were in agreement except fumB. aceA, aceB, acs, ppsA, aceE, aceF, adhE, pgk, ptsG, ptsH, ptsI, crr and pykF of the 25 genes were high in the fold-change. EcoSim predicted that these genes would all be changed from up-regulation to down-regulation (or from downregulation to up-regulation). acnA, fumA, fumB, icdA, mdh, sdhA, sdhB, sdhC, sdhD, sucA and sucB were low in the fold-change. EcoSim also showed that the
S.G. Lee et al. / Journal of Biotechnology 119 (2005) 87–92
91
Fig. 2. Dynamic behavioral prediction of glucose- and acetate-grown E. coli using EcoSim. (a) A part of the hierarchical regulatory network simulated by EcoSim. It is showing how regulatory proteins control genes with time (white, up-regulation of gene or activity of symbols; dark gray, down-regulation of gene or Inactivity of symbols). (b) Up- or down-regulation of genes and regulatory proteins. Data in the table are showing the fold-changes of gene expression in acetate compared with glucose obtained from Oh et al. (2002) and Oh and Liao (2000). (c) Time profiles of glucose (—) and acetate (- - - -).
92
S.G. Lee et al. / Journal of Biotechnology 119 (2005) 87–92
ical species considering multiple mechanism (Covert et al., 2001) and (3) a research area to examine the relationship between environmental conditions and genetic regulation, which is hardly to be found experimentally (Covert et al., 2001).
Acknowledgement This work was supported by Brain Korea 21 Project in 2003. Fig. 2. (Continued).
References up/down-regulation of these genes did not change in both glucose metabolism and acetate metabolism. EcoSim hierarchically displays how genes are controlled with environmental changes. When glucose metabolism was changed to acetate metabolism, it was not straightforward to identify the regulatory protein responsible for the change in gene expression (Oh et al., 2002). However, the dynamic hierarchical regulatory network in EcoSim graphically represented functions of regulatory proteins. Especially, acs, ppsA, aceB, aceA and aceK, showing a major difference between glucose-grown and acetate-grown E. coli were upregulated by cAMP–CRP complex, FruR, FadR and IclR. Oh et al. (2002) suggested that maeB was upregulated by both acetate metabolism and FadR. EcoSim, from this suggestion, showed that maeB can also be up-regulated in ethanol and tryptophan metabolism as carbon sources. In conclusion, we developed an integrated software tool, EcoSim, with graphical user interfaces such as an extracellular environment setting mode and constraints, optimal flux distributions and time profile for substrates, a dynamic flux distribution and dynamic hierarchical regulatory network. EcoSim will be able to assist users in implementing quantitative in silico simulations easily. It may also be expected that the logic system as EcoSim may provide a good tool for (1) high-throughput technology as microarrays to require predictions of transcriptional regulation under given environmental conditions, (2) metabolic engineering fields to more efficiently produce the desired biochem-
Covert, M.W., Schilling, C.H., Palsson, B.O., 2001. Regulation of gene expression in flux balance models of metabolism. J. Theor. Biol. 213, 73–88. Covert, M.W., Palsson, B.O., 2002. Transcriptional regulation in constraints-based metabolic models of Escherichia coli. J. Biol. Chem. 277, 28058–28064. Covert, M.W., Palsson, B.O., 2003. Constraints-based models: regulation of gene expression reduces the steady-state solution space. J. Theor. Biol. 221, 309–325. Kauffman, K.J., Prakash, P., Edwards, J.S., 2003. Advances in flux balance analysis. Curr. Opin. Biotechnol. 14, 491–496. Lee, S.Y., Papoutsakis, E.T., 1999. Metabolic Flux Balance Analysis. Marcel Dekker, New York, pp. 13–57. Mendoza, L., Alvarez-Buylla, E.R., 1998. Dynamics of the genetic regulatory network for Arabidopsis thaliana flower morphogenesis. J. Theor. Biol. 193, 307–319. Oh, M.-K., Rohlin, L., Kao, C.K., Liao, J.C., 2002. Global expression Profiling of acetate-grown Escherichia coli. J. Biol. Chem. 277, 13175–13183. Oh, M.-K., Liao, J.C., 2000. Gene expression profiling by DNA microarrarys and metabolic fluxes in Escherichia coli. Biotechnol. Prog. 16, 278–286. Reed, J.L., Palsson, B.O., 2003. Thirteen years of building constraintbased in silico models of Escherichia coli. J. Bioteriol. 185, 2692–2699. Schilling, C.H., Edwards, J.S., Letscher, D., Palsson, B.O., 2001. Combining pathway analysis with flux balance analysis for the comprehensive study of metabolic systems. Biotechnol. Bioeng. 71, 286–306. Varma, A., Palsson, B.O., 1994a. Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild type Escherichia coli W3110. Appl. Environ. Microbol. 60, 3724–3731. Varma, A., Palsson, B.O., 1994b. Metabolic flux balancing: basic concepts, scientific and practical use. Nat. Biotech. 12, 994–998. Winston, P.H., 1992. Artificial Intelligence, third ed. Addison Wesley, MA, USA, 119–137.