Mario R. Eden, Marianthi Ierapetritou and Gavin P. Towler (Editors) Proceedings of the 13th International Symposium on Process Systems Engineering – PSE 2018 July 1-5, 2018, San Diego, California, USA © 2018 Elsevier B.V. All rights reserved. https://doi.org/10.1016/B978-0-444-64241-7.50412-2
A System Identification Enhanced Phenotype Phase Plane Analysis Matthew Hilliarda, Andrew Damiania, Q. Peter Hea and Jin Wanga* a
Dept. of Chemical Engineering, Auburn University, Auburn, AL, 36849, US
[email protected]
Abstract Genotype-phenotype relationship is fundamental to biology, and predicting different phenotypes based on the sequenced genome is one of the main goals for genome-scale metabolic model development. Phenotype phase plane (PhPP) analysis is a powerful tool to provide a global perspective on the genotype-phenotype relationship, and to help characterize different metabolic phenotypes. However, the traditional PhPP analysis is based on shadow price analysis, which provides limited information to characterize different phenotypes. In this work, we propose a system identification (SID) enhanced PhPP, which not only obtains information that could be obtained through shadow price analysis, but also provides additional information that helps characterize different phenotypes. The effectiveness of the SID-PhPP approach is demonstrated using an illustrative cell model and a core metabolic network model of E. coli. Keywords: Phenotype phase plane analysis, System identification, Genome-scale metabolic model, Shadow price.
1. Introduction Systems biology is a rapidly evolving discipline that seeks to determine how complex biological systems function by integrating experimentally derived information through mathematical modeling. Genome-scale metabolic network models represent the link between the genotype and phenotype of the organism. In essence, a genome-scale metabolic model (GEM) is a comprehensive functional database of the organism’s cellular metabolism (King et al., 2015; Sánchez and Nielsen, 2015), which consists of a set of metabolites, metabolic reactions (i.e., stoichiometric matrix), and constraints. These models provide a holistic view of the organism’s metabolism, and constraintbased metabolic flux analyses, such as flux balance analysis (FBA), have been used extensively to study genome-wide cellular metabolic networks (Kauffman et al., 2003). One major application of GEM is to predict different growth phenotypes (e.g., how fast cells grow, what products are excreted) in various genetic and environmental conditions. Developed by the Palsson lab, phenotype phase plane (PhPP) analysis is a powerful tool that uses FBA with the GEM to provide a global perspective on the genotype-phenotype relationship, and to help characterize different metabolic phenotypes (Edwards et al., 2002). In PhPP analysis, the classification of different phenotypes is based on the shadow price of various metabolites, where a phenotype is defined by all the conditions where a metabolite’s shadow price stays constant. However, shadow prices only describe how each metabolite affects the objective function of FBA, without providing any information on how different reactions interact with each other within the same phenotype. Therefore, when PhPP is applied to analyze GEMs, it is usually combined with other in silico simulations and gene deletions to
M. Hilliard et al.
2504
identify and explain the fundamental differences among different phenotypes (Duarte et al., 2004; Edwards et al., 2001). To address this challenge, the system identification (SID) based framework that we previously developed for genome-scale metabolic model analysis (Damiani et al., 2015) is extended to enhance PhPP, which is termed SID-enhanced PhPP (SID-PhPP) analysis. In the SID framework, we first perturb the metabolic network through designed input sequences, i.e., designed in silico experiments; then we apply multivariate statistical analysis tools such as principal component analysis (PCA) to analyze the in silico results in order to extract information on how such perturbations propagate through the network; finally, we visualize the extracted knowledge against the network map to provide meaningful and intuitive interpretations of the in silico results. The SID framework has been successfully applied to genome-scale model evaluation, as well as guiding the development of an improved genome-scale model of Scheffersomyces stipites (Damiani et al., 2015; Damiani et al., 2017). In this work, we first briefly review the traditional PhPP analysis and introduce the SID- PhPP method, using the same illustrative example provided in the original PhPP work (Edwards et al., 2002); then we use the core metabolic network model of E. coli to demonstrate the effectiveness of the proposed SID-PhPP approach, and show how a “hidden” phenotype that share the same set of shadow prices with another phenotype could be identified by the SID-PhPP approach.
2. Traditional PhPP Analysis Metabolic phenotypes can be viewed as different metabolic network utilization profiles. Within each phenotype, all points (culture conditions) share the same set of activated pathways, and the same set of excreted products. In other words, the activated reactions and/or excreted products are different among different phenotypes. The traditional PhPP analysis provides a global view of each phenotype in genome-scale models by breaking down the metabolic behavior in distinct phases through shadow price analysis. ௗ Mathematically, shadow price of metabolite i is defined as ߛ ൌ െ ೡ where Z is the ௗ
objective function of FBA and ܾ௩ denotes the additional availability for the metabolite (Edwards et al., 2002). In the traditional PhPP analysis, a phenotype is defined by all the culture conditions, defined by two independent variables (typically the carbon and oxygen source), where metabolites’ shadow prices stay constant. Carbon Rx9 ATP
Oxygen
Rx1
Rx2 A
O2
ATP
Rx3
NADH
Rx8 B
2 ATP
2 ATP
3 NADH Rx4
10 NADH
Rx6
Rx5
3D ATP Rx7
C Rx11
2 NADH
Rx12
Rx13
3E
10 ATP Rx10
Biomass[c]
(a)
(b)
Figure 1. (a) The network map of an illustrative cell model; (b) Phenotype phase plane of the cell model.
A System Identification Enhanced Phenotype Phase Plane Analysis
2505
Table 1. Shadow prices of different metabolites in different phenotypes P1 P2 P3 P4
Carbon -1.30 -0.21 -0.05 0.50
O2 0.10 -0.17 -0.23 -0.50
ATP 0.00 -0.09 -0.09 0.00
B -1.30 -0.30 -0.14 0.50
NADH -0.10 -0.01 0.05 0.50
C -1.00 -0.09 -0.09 -1.00
D -0.33 0.00 0.00 -0.33
E -0.40 -0.04 0.00 0.00
Biomass 0.00 0.00 0.00 0.00
Here we use a simplified cell model to illustrate how PhPP analysis works. Figure 1 (a) shows the network map of the illustrative example, which was first introduced in (Edwards et al., 2002). Figure 1 (b) plots “phenotype phase plane” of the model cell, which exhibits 4 different phenotypes (i.e., P1~P4). The boundaries between different phenotypes are determined by the shadow prices, with a few of them listed in Table 1. It is worth noting that, according to PhPP, a zero shadow price indicates the metabolite would be secreted because it has no effect on the objective function (Edwards et al., 2002).
3. The SID Enhanced PhPP Analysis The SID-PhPP is proposed in this work as an alternative approach to characterize different phenotypes. Figure 2 provides an overview of the SID-PhPP. The same illustrative example from Section 2 is used to demonstrate how SID-PhPP works. First, the designed one-dimensional in silico experiments are conducted to determine the boundaries of the different phenotype phases. As shown in Figure 3 (a), one set of designed experiments is to hold carbon uptake flux constant while varying oxygen update. Next, PCA was applied to analyze the resulted flux matrixes, and the resulted scores corresponding to the first principal component (PC) are plotted in Figure 3 (b). Each linear segment in the score plot indicates one phenotype, and Figure 3 shows that the phenotype boundaries determined through SID-PhPP exactly match those determined through shadow price analysis. It is worth noting that SID-PhPP does not require running simulations that cover the whole plane. Instead, only few sets of onedimensional in silico experiments are needed to determine the phenotype boundaries. Flux
In silico exp. with 1D perturbation
Score Principal
component analysis (PCA)
Identified phenotype phases
Loading
Identified active network(s)
How reactions affected by perturbation
Figure 2. Overview of the SID enhanced PhPP
In the next section, we use the E. coli core model as an example to illustrate how SIDbased framework can help characterize different phenotypes, as well as to discover “hidden” phenotypes that share the same set of shadow prices with another phenotype.
M. Hilliard et al.
2506
(a)
(b)
Figure 3. (a) Designed in silico experiments at fixed carbon uptake and varying oxygen uptake; (b) Phenotype boundaries determined through PC scores.
4. Application to E. coli core model 4.1. E. coli core model The central carbon metabolic model of E. coli proposed by Orth et al. (2010) contains 95 reactions and 72 metabolites, with major reaction pathways including glycolysis, both branches of the pentose phosphate pathway (PPP), TCA cycle, and electron transport chain. The two substrates chosen for this study are glucose and oxygen. Five by-products could be excreted are CO2, lactate, formate, ethanol, and acetate. The biomass reaction is used as the objective function. 4.2. The traditional PhPP analysis First, the traditional PhPP was applied to predict phenotypes based on the E. coli core model, which results in 4 different phenotypes (P1~P4) as shown in Figure 4 (a), with different by-products excreted (Table 2). The shadow prices of a few key metabolites are listed in Table 3. It is worth noting that although formate has shadow price of 0 for all 4 phenotypes, it is only excreted in P3 and P4, not in P1 or P2. For ethanol, although its shadow price is not zero for P4, FBA simulations show that ethanol is excreted in P4; similarly for acetate, although its shadow price is not zero for any phenotypes, it is excreted in P3 and P4 confirmed by FBA simulations. This finding indicates that the statement about zero shadow price by Edwards et al. (2002) is inaccurate. Table 2 By-products excreted in different phenotypes Phase P1 P2 P3 P4
By-products CO2 CO2 Acetate, Formate, CO2 Acetate, Formate, Ethanol
Table 3 Shadow prices of some key metabolites in different phenotypes Metabolite Glucose Oxygen CO2 Formate Acetate Ethanol
P1 -0.1373 0.0229 0 0 -0.0458 -0.0686
P2 -0.0459 -0.0230 0 0 -0.0026 -0.0128
P3 -0.0325 -0.0325 0 0 -0.0027 -0.0054
P4 -0.0304 -0.0359 -0.0014 0 -0.0028 -0.0028
A System Identification Enhanced Phenotype Phase Plane Analysis
2507
4.3. The SID-enhanced PhPP analysis
20
20
18 Infeasible
18 Infeasible
16
16
14
14
12
P1
10
Oxygen Flux
Oxygen Flux
SID-PhPP is performed to analyze the same model, with results shown in Figure 4 (b). It can be seen that SID-PhPP identified 5 different phenotypes. Three of them (P1, P2 and P4) are the same as those identified by the traditional PhPP, while P3 identified by the traditional PhPP is split into two phenotypes (P3’ and P3”) by SID-PhPP. It is confirmed that the shadow prices of different metabolites in P3’ and P3’’ are exactly the same as those of P3 listed in Table 3. According to the traditional PhPP, P3’ and P3” should be the same phenotype. However, SID-PhPP analysis shows that, for in silico experiments conducted in P3, the resulted scores form two linear segments, indicating two different phenotypes. To determine whether the two linear segments represent two different phenotypes, we visualized the SID-PhPP analysis results against the network map for P3’ and P3’’ in Figure 5 (a) and (b), respectively. Figure 5 clearly shows that P3’ and P3’’ are two distinct phenotypes, with different activated reaction pathways (oxidative PPP active in P3’, while inactive in P3”) and different by-product excretion patterns (acetate only in P3’, while acetate and formate in P3”), although they share the same set of shadow prices.
P2
8
P3
10
P3'
6
4
4
2
4 6 Glucose Flux
(a)
P3''
2
P4 0
P2
8
6
2
P1
12
8
10
0
P4 2
4 6 Glucose Flux
8
10
(b)
Figure 4. (a) The traditional PhPP reveals four different phenotypes; (b) The SID-PhPP reveals five different phenotypes.
5. Conclusions Genome-scale metabolic network models represent the link between the genotype and phenotype of the organism, and phenotype phase plane analysis is a prevalent method to characterize different phenotypes predicted by genome-scale model. However, certain limitations exist for the original PhPP analysis approach. For example, as we have shown using the E. coli core model, the metabolites with zero shadow prices are not always excreted, while the metabolites with non-zero shadow prices could be excreted. This is because the shadow prices only quantify the effect of the overall flux of a metabolite on the FBA objective function, without taking the interactions among different activated pathways into account. However, it is the interactions among different activated pathways that determine the function and behaviour of a given phenotype. To address this limitation, we propose SID enhanced PhPP in this work. In the SID-PhPP, the designed in silico experiments are conducted to determine the boundaries between different phenotypes with significantly reduced computation. More importantly, SID-PhPP provides additional information that help fully characterize
M. Hilliard et al.
2508
different phenotypes. The effectiveness of the SID-PhPP approach is demonstrated using an illustrative example and a core metabolic network model of E. coli. Our results show that SID-PhPP is a powerful tool for revealing the metabolic capability of a metabolic network model, and enables better and more intuitive systems level understanding which provides the foundation for improved strain development. G6P
6PGL
6PGC
F6P FBP
DHAP
GAP
Ru5P
G6P
Xu5P P
R5P
GAP
S7P
E4P
F6P
6PGC
FBP
DHAP
GAP
CO2
Biomass
2PG
S7P
E4P
F6P
Biomass
2PG
O2
O2
PEP
PEP
PYR
PYR
Formate
Formate AcCoA
AcALD
OAA AA
MAL M CoQH2
AcCoA
EtOH AC
AC-P
CIT
ATP
R5P
GAP
3PG
3PG
ADP
Ru5P
Xu5P P
13BPG
13BPG
CO2
6PGL
F6P
OAA AA
ICIT AKG
GLO
L-Glut
ADP
ATP
MAL M
C SuccCoA
FUM
NADH
Succ
AcALD
CoQH2
NADP(+)
EtOH AC
AC-P
CIT ICIT AKG
GLO
L-Glut
C SuccCoA
FUM
NADH
Succ
NADP(+) NAD(+)
NAD(+)
CoQ
CoQ
NADH
NADH
NADPH N
(a)
NADPH N NAD(+)
NAD(+)
(b)
Figure 5. Different activated pathways and by-product excretion patterns in P3’ (a) and P3’’ (b) with an increasing oxygen uptake, where green arrows indicating increasing flux, red arrows indicating decreasing flux.
References A Damiani, QP He, J. Wang, 2017. A System Identification Based Framework for Genome-Scale Metabolic Model Validation and Refinement. In: . Proc. 10th IFAC Int. Symp. Adv. Control. Chem. Process., pp. 13013–13018. AL Damiani, QP He, TW Jeffries, J. Wang, 2015. Comprehensive evaluation of two genome-scale metabolic network models for Scheffersomyces stipitis. Biotechnol. Bioeng. 112:1250–1262. NC Duarte, BØ Palsson, P Fu. 2004. Integrated analysis of metabolic phenotypes in Saccharomyces cerevisiae. BMC Genomic- 5. JS Edwards, RU Ibarra, BO Palsson. 2001. In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat. Biotechnol. 19:125–130. JS Edwards, R Ramakrishna, BO Palsson. 2002. Characterizing the metabolic phenotype: a phenotype phase plane analysis. Biotechnol. Bioeng. 77:27–36. KJ Kauffman, P Prakash, JS Edwards. 2003. Advances in flux balance analysis. Curr. Opin. Biotechnol. 14:491–496. ZA King, CJ Lloyd, AM Feist, BO Palsson. 2015. Next-generation genome-scale models for metabolic engineering. Curr. Opin. Biotechnol. 35:23–29. JD Orth, RM Fleming, BO Palsson. 2010. Reconstruction and use of microbial metabolic networks: the core Escherichia coli metabolic model as an educational guide. EcoSal plus. BJ Sánchez, J Nielsen. 2015. Genome scale models of yeast: towards standardized evaluation and consistent omic integration. Integr. Biol. 7:846–858.