250
Review
TRENDS in Biochemical Sciences
Vol.28 No.5 May 2003
Metabolic pathways in the post-genome era Jason A. Papin1, Nathan D. Price1, Sharon J. Wiback1, David A. Fell2 and Bernhard O. Palsson1 1 2
Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093-0412, USA School of Biological & Molecular Sciences, Oxford Brookes University, Oxford OX3 0BP, UK
Metabolic pathways are a central paradigm in biology. Historically, they have been defined on the basis of their step-by-step discovery. However, the genomescale metabolic networks now being reconstructed from annotation of genome sequences demand new network-based definitions of pathways to facilitate analysis of their capabilities and functions, such as metabolic versatility and robustness, and optimal growth rates. This demand has led to the development of a new mathematically based analysis of complex, metabolic networks that enumerates all their unique pathways that take into account all requirements for cofactors and byproducts. Applications include the design of engineered biological systems, the generation of testable hypotheses regarding network structure and function, and the elucidation of properties that can not be described by simple descriptions of individual components (such as product yield, network robustness, correlated reactions and predictions of minimal media). Recently, these properties have also been studied in genome-scale networks. Thus, network-based pathways are emerging as an important paradigm for analysis of biological systems. The high-throughput experimental technologies that have rapidly developed within genomic science allow us to obtain comprehensive data about the molecular make-up of cells, thus enabling us to reconstruct large-scale (even genome-scale) reaction networks [1]. These reaction networks incorporate diverse datasets including genome annotations, biochemical characterizations and cell physiology experiments. With these reconstructed models, analytical frameworks are being developed that can mathematically describe cellular functions including growth and regulation [2,3]. These systemic functions arise from the interaction of a multitude of metabolites and enzymes, and are not readily described or understood by a piece-wise characterization of the individual components. In this context, network-based metabolic pathways attempt to mathematically describe systemic functions from the existing knowledge of the cellular components and their elaborate connectivity, and have therefore evolved from traditional metabolic pathway definitions. From years of research, biochemical experimentation has Corresponding author: Bernhard O. Palsson (
[email protected]).
defined stoichiometries of many reactions (Fig. 1a). With the cataloging of multiple reactions, shared metabolites have been grouped and traditional pathways have been described (Fig. 1b). These traditional pathways include glycolysis, the pentose phosphate pathway and the tricarboxylic acid (TCA) cycle. With the more complete set of data that has arisen with genome annotation efforts, all the components of a metabolic network can be identified and described. With this level of description, mathematically defined pathways enable the analysis of a complete metabolic system (Fig. 1c). These network-based metabolic pathways are often more complex than their traditional counterparts. However, they correspond to precise mathematical descriptions of cellular properties [e.g. cellular growth and defined reaction FLUX (see Glossary) levels with given substrate inputs] and account for mass balance and thermodynamic constraints. To date, traditional metabolic pathways [4] have served as conceptual frameworks for research and teaching. These pathways provide an important means of effective
(a)
(b) Evolution of network-based pathways
A
B
C
B
C
(c)
D
byp 2A
2B
C
E cof
cof
byp D
byp
System boundary Ti BS
Fig. 1. Development of the network-based pathway paradigm. (a) With advanced biochemical techniques, years of research have led to the precise characterization of individual reactions. As a result, the complete stoichiometries of many metabolic reactions have been characterized. (b) Most of these reactions have been grouped into ‘traditional pathways’ (e.g. glycolysis) that do not account for cofactors and byproducts in a way that lends itself to a mathematical description. With sequenced and annotated genomes, models can be made that account for many metabolic reactions in an organism. (c) Subsequently, network-based, mathematically defined pathways can be analyzed that account for a complete network (note that black and gray arrows correspond to active and inactive reactions, respectively).
http://tibs.trends.com 0968-0004/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0968-0004(03)00064-1
TRENDS in Biochemical Sciences
yeast and the TCA cycle in aerobic muscle), but the metabolic pathway is then treated as a universal concept and mistakenly applied to cells with different enzyme levels and metabolic profiles. However, queries of comprehensive databases of traditional pathways have still yielded interesting numerical results [5,6]. For example, the number of reactions catalyzed by multiple enzymes, the number of enzymes that catalyze multiple reactions and the number of reactions in traditional pathways, were recently calculated for the metabolic network of Escherichia coli [7]. These statistical analyses of traditional pathways can provide interesting comparisons between organisms and can be an important means of evaluating whether functions are present and/or absent in different metabolic networks. Recently, several mathematical approaches for defining metabolic pathways have emerged. These approaches analyze the SYSTEMS PROPERTIES of large networks and move beyond the traditional pathway definitions present in current biochemistry textbooks. These include GRAPH THEORY definitions, which involve the description of metabolic networks in the rich language of graph theory from computer science. A promising approach within graph theory involves petri nets that have many parallels with existing metabolic pathway analyses, but, to date, this analysis has only been used for limited cases [8]. In addition, HEURISTICALLY DEFINED PATHWAYS have been successfully applied to metabolic networks [9,10]. Networkbased ELEMENTARY MODES [11,12] and EXTREME PATHWAYS [13,14] uniquely and mathematically define biochemical pathways directly from network topology (i.e. the structure of the metabolic network characterized by its reactions and metabolites). Thus, they are unambiguously defined without observer biases that might distort systematic comparisons between different metabolic networks and their properties. A useful construct for describing these network-based, metabolic pathways is presented in Fig. 2. First, a reaction
Glossary Convex solution cone: A mathematically defined space (whose axes correspond to reaction fluxes) that encompasses all valid steady-state flux distributions of a metabolic network. Elementary modes: A unique set of non-decomposable enzyme sets that characterizes the convex solution cone. Exchange flux: A reaction flux that crosses the system boundary. Extreme pathways: A minimal and unique set of pathways (a subset of elementary modes) that defines the edges of the convex solution cone. Flux: The production or consumption of mass per unit area per unit time. Flux distribution: A particular set of reaction fluxes in a network. Graph theory: A branch of computer science with a rich history that has been used to describe reaction networks (e.g. petri nets of small metabolic networks). Heuristically defined pathways: Pathways that are defined through the application of a series of rules connecting substrates to products. Internal flux: A reaction flux within defined system boundaries. Network dead ends: Metabolites that are not fully connected to the reaction network, perhaps indicating incomplete characterization of involved reactions; used interchangeably with ‘gaps’. Network robustness: The ability of the metabolic network to respond to changes in its structure (e.g. enzyme deletions). Non-decomposable: A pathway characteristic in which if any active flux is zero, then the steady-state flux through the entire pathway is zero. Pathway redundancy: A measure of the number of equivalent pathways with respect to identical inputs and outputs. Singular value decomposition (SVD): A mathematical method to transform matrices; used for determining key properties of a matrix. Steady state: The network state in which all metabolite concentrations are constant, although there might be flux through the reactions. Stoichiometric matrix: A matrix with reaction stoichiometries as columns and metabolite participations as rows. Systems properties: Biological properties that emerge from the interaction of the network components (e.g. minimal medium requirements and product yield); used interchangeably with ‘network properties’ and ‘emergent properties’.
communication regarding metabolism of various organisms, and will continue to be an important reference and basis for network reconstruction. Although useful, these traditional pathway definitions do not provide for quantitative, systemic evaluations of biological reaction networks because they focus on subsets of networks without consideration of network-wide interactions. Often, they have been discovered and defined in cells in which they represent significant metabolic activities (e.g. glycolysis in
(a) Reaction network
(b)
(c)
Reactions 1 0 . . 2
0 1 . . .
. . . . .
. . . 0 –1
1 . . 0 0
Metabolites
Creation of stoichiometric matrix
251
Vol.28 No.5 May 2003
Pathway analysis
flux vB
Review
flux vC
flux vA
Ti BS
Fig. 2. Network-based pathways mathematically characterize the functions of a metabolic network. As a framework for understanding what these network-based pathways can characterize, we begin with a given reaction network (e.g. E. coli metabolism). Genome annotations, biochemical experiments and cell physiology data provide information to describe all reactions within a system. (a) A reaction network is created from diverse data sets by defining all the different reactions in an organism. EXCHANGE FLUXES cross the system boundary; INTERNAL FLUXES are within the system boundary. (b) These data are then used to create a stoichiometric matrix that relates all of the reactions within a network to all of the participating metabolites in a given organism. The stoichiometric matrix is an important part of the in silico model. With the matrix, extreme pathway and elementary mode analyses can be used to generate a unique set of pathways. (c) In a high-dimensional flux space in which each axis corresponds to the flux through a given reaction, the network-based pathways define the limits to the possible steady-state flux distributions that a network can achieve. All possible flux distributions of a metabolic network lie within the ‘cone’ circumscribed by the pathways. http://tibs.trends.com
252
Review
TRENDS in Biochemical Sciences
network is defined using genomic, biochemical, physiological and other datasets (Fig. 2a) [1]. Then, this network is represented by a STOICHIOMETRIC MATRIX with elements corresponding to the stoichiometric coefficients, relating reactions and metabolites (Fig. 2b). This matrix is analyzed by an algorithm that defines a valid set of metabolic pathways (e.g. elementary modes or extreme pathways). For conceptual purposes, these pathways can then be visualized as vectors in a high dimensional space whose axes correspond to the flux levels through the individual reactions (Fig. 2c). This space has as many dimensions as there are reactions in the metabolic network and the numerical value on a given axis is the flux level in the corresponding reaction. The flux cone depicted is the set of points that corresponds to allowable flux values in the metabolic network. If every reaction were allowed any flux level, then this cone would be bounded by the axes. However, the reactions have defined stoichiometries and, consequently, certain FLUX DISTRIBUTIONS are not allowed. The edges of the flux cone are unique extreme pathways that provide mathematical descriptions of the range of flux levels that are allowed. The mathematical foundation and unique features of these network-based pathways enable evaluation of network properties such as product yield, NETWORK ROBUSTNESS and correlated reactions. Recently, these analyses have been performed at a genome scale [15 – 18]. Thus, elementary modes and extreme pathways are likely to play a growing role in our analysis of the complex biochemical reaction networks that are being unraveled at an ever-increasing pace. History of network-based metabolic pathway analyses Network-based metabolic pathway analysis has a relatively short history. A mathematical formalism for analyzing biochemical pathways was developed in the late 1980s [9]; from a database that contained enzyme and metabolite information, the algorithm generated a set of pathways that transformed a selected substrate to a selected product. However, this approach was restricted to single substrates and products, and did not guarantee that all possible pathways were generated. After overcoming some of the difficulties associated with this algorithm, pathway analysis was further advanced in 1990 [10]; this improved algorithm generated a set of pathways that could account for multiple reactants and products. However, it required an initial classification of metabolites (e.g. metabolites were categorized into required substrates and products, allowed substrates and products, excluded substrates and products, intermediates). This up-front classification narrowed the scope of the approach – different pathways could be generated based on which classification scheme was used. To create a more general approach to pathway analysis, techniques from linear algebra were applied to models of metabolic networks. The linear basis of a stoichiometric matrix was suggested as a set of pathways that would characterize all the possible routes through a metabolic network [19,20]. This approach was applied within the well-studied framework of metabolic control analysis [21,22], and was used to describe kinetic and structural features of metabolic networks [19,20]. Whereas linear http://tibs.trends.com
Vol.28 No.5 May 2003
bases as pathways are independent and all possible flux distributions can be described by a linear combination of the basis pathways, the corresponding pathways are not a unique set (by definition, linear bases that describe the null space of a matrix are non-unique [23]). In addition, to decompose a given solution into the resultant vectors, the vectors can be used in either a positive or negative direction. However, some metabolic reactions are effectively irreversible and the use of such reactions in an opposite direction limits the validity of linear bases as metabolic pathways. To account for the effective irreversibility of many reactions and to create a unique set of systemic pathways, convex analysis has been applied to biochemical systems. Convex analysis is a branch of mathematics that can be used to analyze systems of linear inequalities [24] – it can be used to overcome the shortcomings associated with effective reaction directionality (which can be formulated as linear inequality constraints) and non-uniqueness [24]. Initially, convex analysis was used to study stability in chemical reaction networks and led to the definition of systemic pathways called extreme currents [25,26]. Recently, the mathematics of convex analysis was utilized in the development of an algorithm that generates a unique set of pathways called elementary modes [11]. In the flux space described (Fig. 2c), the elementary modes exist in a lower dimension than that of extreme currents because of a specific formulation of the algorithm [11]. In using convex analysis for extreme currents and elementary modes, linear independence is lost, but uniqueness of basis vectors is gained. This uniqueness enables the evaluation of invariant systems properties (e.g. calculating the maximum theoretical yield of a particular product). Although elementary mode analysis enumerates all distinct metabolic routes, extreme pathway analysis focuses on enumerating the unique and minimal set of convex basis vectors needed to describe all allowable steady-state flux distributions through the metabolic network [14]. Thus, extreme pathways define the edges of the convex cone (Fig. 2c) that contains all the possible flux distributions in the metabolic network and are a subset of the elementary modes. Moreover, in contrast to extreme currents, extreme pathways can account for bi-directional exchange reactions; consequently, there will never be fewer extreme currents than extreme pathways for any network topology. Given these rigorous mathematical frameworks, further analysis into the in vivo capabilities and uses of a network can be achieved. Importantly, these descriptions are unique and, thus, characterize invariant properties of a metabolic network. Elementary modes and their irreducible subset, the extreme pathways Three characteristics define network-based pathway analyses (elementary modes and extreme pathways): (1) both approaches require that the pathways be NON-DECOMPOSABLE (i.e. if an active flux in a nondecomposable pathway is restricted to zero then the steady-state flux through the entire pathway must be zero); (2) both approaches to pathway analysis result in a
Review
TRENDS in Biochemical Sciences
253
Vol.28 No.5 May 2003
(e.g. glucose can only go into and ammonia can only go out of the system), the extreme pathways and the elementary modes effectively result in the same set of pathways. For many reaction networks with bi-directional exchange reactions, there are more elementary modes than extreme pathways. However, if a metabolite is reversibly exchanged with the environment and it is only connected to irreversible internal reactions, the elementary modes and extreme pathways might be equivalent. More specific details on the algorithmic differences between these two approaches have been previously described [12– 14]. Importantly, these network-based pathways describe the conversion of substrates into products. The elements in the system are balanced, and all cofactor pools are maintained at STEADY STATE . This characteristic is an important distinction from traditional pathways; the extreme pathways and elementary modes account for all the resources and reaction steps a network must use for anabolic and catabolic processes. In Fig. 3, a sample metabolic network is presented. In this network there are seven metabolites and nine reactions. The stoichiometric matrix is formulated, the extreme pathways and elementary modes are calculated, and the resultant three pathways are then diagramed. This network shows how cofactors, byproducts and all metabolites must be appropriately accounted for to ensure that mass is balanced. All
unique set; (3) extreme pathway analysis additionally requires systemic independence (i.e. there are no extreme pathways that can be represented by non-negative linear combinations of other extreme pathways). Consequently, extreme pathways are a minimal set of elementary modes (see Box 1 for a simple reaction network that illustrates these defining characteristics). The additional elementary modes (which represent non-negative linear combinations of the extreme pathways) might lie on the surface or the interior of the CONVEX SOLUTION CONE [12]. Another important feature of these approaches is that extreme pathways and elementary modes encapsulate fundamental biological constraints of mass balance and simple thermodynamics (reaction directionality). An important result of these defining characteristics is that the elementary modes and extreme pathways can describe all possible steady-state flux distributions of a metabolic network. Because the approaches assume steady-state conditions, they also characterize time-invariant properties of metabolic networks. For both approaches, metabolites and reactions are classified with reference to the system and thermodynamic constraints. Extreme pathways and elementary modes require the classification of metabolites as either internal or external to the system. It is also necessary to define reactions as either reversible or irreversible. When all exchange reactions are constrained to be irreversible
Box 1. Elementary modes and extreme pathways There are three extreme pathways (ExPa) and four elementary modes (EIMo) for the simple reaction network illustrated in Fig. I. Note that the extreme pathways and elementary modes satisfy the two shared requirements: each set of pathways is non-decomposable and each set of pathways is unique. In addition, the extreme pathways are systemically independent; note that ElMo 1 (ExPa 1) and ElMo 2 (ExPa 2) can be combined to give ElMo 4. ElMo 3 and ElMo 4 immediately show that there are two possible modes that continue to exist if the reversible exchange of A is prevented by mutation or inhibition. These two elementary modes are the extreme pathways for the reaction network without the exchange of A. It is important to note that this simple network does not illustrate the effects of cofactors nor does it represent more complex chemical reactions (which would significantly increase the complexity of the analyses). Rather, this simple network serves merely as an example of one way in which elementary modes and extreme pathways differ from each other. See [11,13,14] for more detailed algorithmic descriptions.
Governing constraints of elementary modes and extreme pathways (1) Mass balance at steady state (equality) S·v ¼ 0 where S is the stoichiometric matrix with rows as metabolites and columns as reactions and v is a vector corresponding to fluxes through each reaction. (2) Thermodynamics, reaction directionality (inequality) vi $ 0 where vi is the flux through reaction i: All irreversible reactions proceed in the appropriate direction. In extreme pathway analysis, reversible reactions are decoupled into their forward and backward components, and both are subject to this constraint. Reaction directionality is maintained in elementary mode analysis through a series of algorithmic steps.
(b) Extreme pathways
(a) Reaction network
ExPa 1 A
B
A
A
B
ElMo 1 B
A
ExPa 3 A
A
A
ElMo 4 B
C
B C
ElMo 3 B
C
ElMo 2 B
C
C
C
C
(c) Elementary modes
ExPa 2
A
B C
Ti BS
Fig. l. Elementary modes and extreme pathways in a reaction network.
http://tibs.trends.com
254
Review
TRENDS in Biochemical Sciences
Vol.28 No.5 May 2003
System boundary byp b1
2A
v1
v2
2B
cof v3
v6
C v5
E
b2
cof
S= v4
byp D
byp
b3
v1 v2 v3 v4 v5 v6 b1 –1 0 0 0 0 0 +1 +1 –2 +2 0 0 0 0 0 –1 –1 0 0 +1 0 0 0 0 +1 –1 +1 0 0 0 0 +1 0 +1 0 0 0 +1 +1 0 0 0 0 0 0 –1 +1 –1 0
b2 0 0 0 0 –1 0 0
b3 0 0 0 0 0 –1 0
A B C D E byp cof
P1 byp 2A
2B
C
E cof
cof
byp D
P=
P1 2 1 0 0 0 1 2 1 1
P2 2 0 1 1 0 0 2 1 1
P3 2 1 0 1 1 0 2 1 1
byp
P2 v1 v2 v3 v4 v5 v6 b1 b3 b2
byp 2A
2B
C
E cof
cof
byp D
byp
P3 byp 2A
2B
C
E cof
cof
byp D
byp
Ti BS
Fig. 3. Network-based pathways of a sample reaction network with cofactors and byproducts. First, the reaction network is represented by a stoichiometric matrix (S) with rows representing the participation of metabolites in reactions and columns as the stoichiometric coefficients for the individual reactions. This matrix is analyzed with pathway analysis (elementary modes or extreme pathways). These pathways can be represented in a matrix (P) where the rows represent the fluxes through corresponding reactions and the columns are the resultant pathways. These pathways can be illustrated for simple networks (P1, P2, P3).
reactions must be used in their effective directions; any possible flux distribution in the network can then be written as a non-negative linear combination of these three pathways. With this basis, direct applications to observed phenotypes can be interpreted. The technique of metabolic flux analysis [27] measures the flux patterns in a cell through isotope tagging of particular metabolites; the possibility of decomposing a particular measured flux distribution into constituent extreme pathways or elementary modes allows for a functional and mathematical description of the flux pattern within a living cell [28]. Thus, these networkbased pathways can represent network functions that correspond to observed phenotypic behavior. Analysis of network properties Many relevant biological properties have been characterized, and novel hypotheses generated through the application of elementary modes and extreme pathways. An increasing number of metabolic networks have been studied, including the human red blood cell [29,30] and the microorganisms E. coli [3,31,32], Haemophilus influenzae [16,17], Helicobacter pylori [15,18], Methylobacterium extorquens [33] and Saccharomyces cerevisiae [34,35]. http://tibs.trends.com
Some of the key properties studied using network-based pathways are schematically described in Fig. 4. For example, minimal medium requirements have been defined for H. influenzae and H. pylori using extreme pathways [15,16] (Fig. 4a). The computed minimal medium composition is highly consistent with experimental results. Reactions that are not connected to the metabolic networks in genome-scale models of H. pylori, H. influenzae and a model of Mycoplasma hominis have also been analyzed with extreme pathways and elementary modes [15,16,36] (Fig. 4b). These NETWORK DEAD ENDS or gaps correspond to reactions with reactants or products that are not produced or consumed in other parts of the metabolic network model, and suggest poorly characterized sections of the network. These gaps are currently being reconciled with genome annotations. PATHWAY REDUNDANCY is a measure of how many systemically independent pathways have equivalent input and output fluxes – a quantitative description of network flexibility (Fig. 4c) – and can be computed from either the extreme pathways or the elementary modes (when the extreme pathways and elementary modes are different for a particular network, the quantitative measures of pathway redundancy might differ). An extreme pathway analysis of amino-acid
Review
TRENDS in Biochemical Sciences
(a) Minimal medium
255
Vol.28 No.5 May 2003
(b) Unutilized reactions P1
A A
B
B
C
D
X Metabolic network
C
P2
Biomass
Y
F
Z
RA
E
D E (c) Pathway redundancy
(d) Correlated reaction sets/enzyme subsets P1
RA
A
RB
B
RC
C
RE
RD RF
D
RG
P1 RA
P2
E
A
RB
B
RC
C
RE
RD RF
D
RG P2
E
Ti BS
Fig. 4. Systemic metabolic network properties can be elucidated from extreme pathway and elementary mode analysis, and systems properties can be quantified with network-based analyses of metabolic networks. (a) Predictions of a minimal medium can be made with pathway analysis, including a characterization of which substrates are required for which biomass constituents. Extreme pathways can enumerate which inputs (A, B, C, D, E) are required (B and E) and not required (A, C and D) for the synthesis of the biomass precursors (X, Y, Z) [15,16]. (b) Pathway analysis also characterizes unused reactions, which can lead to refined annotations and network reconstruction. Note that one reaction (RA) is not used in any extreme pathway, indicating that it is an unconnected part of the network. Associated reactions or metabolites have not yet been identified. This ‘gap’ analysis can lead to refinements of genome annotation (see [15,16,36]). (c) Pathway analysis also enables a quantifiable description of pathway redundancy or network robustness. Pathway redundancy is demonstrated by P1 and P2. Both of these pathways take up A and secrete D, yet each uses different reactions to accomplish this objective. A high degree of pathway redundancy would imply that overall metabolic transformations might be able to tolerate single-gene knockouts or drug inhibition (see [3,17,18]). (d) Systemically correlated reaction sets are also identified with the pathway analyses described herein. Inferences about likely regulation can be deduced from the extreme pathways or elementary modes of a metabolic network. Three of the reactions (RA, RB, and RG) are always utilized together, here with equivalent flux ratios. A testable hypothesis from this analysis is that the genes corresponding to these three reactions are coordinately regulated. Also, if one or two of these activities are assigned to open reading frames (ORFs) in the gene sequence, it is likely that others are present if their ORFs have not yet been assigned (see [38,39]).
synthesis in genome-scale models of H. influenzae and H. pylori revealed a significant difference in pathway redundancy between the two organisms [17,18]. Pathway redundancy has also been assessed for a model of core E. coli metabolism using elementary modes [3]. In this study, robustness to gene deletions correlated with pathway redundancy. One consequence of a high degree of redundancy is that it might be difficult to assess the internal state of a cell because there can be multiple ways to produce the same behavior. Experimental evidence for the existence of different ways to generate a particular phenotype is emerging [37]. Moreover, systemically correlated reactions of genome-scale reaction networks (known as enzyme subsets within elementary mode analysis [38]) have been identified using extreme pathways [39]. These correlated sets could have significant implications in understanding the regulatory structure of metabolic networks (Fig. 4d). These reactions could reside in the same operon in prokaryotes or belong to the same regulon in eukaryotes, optimizing the demands of regulatory activity. Consistent with this, members of the enzyme subsets of the central metabolism of S. cerevisiae have been found to show correlated changes in gene expression during the diauxic shift [40]. Network-based pathway analysis has recently been extended to incorporate other cellular functions. A formalism for incorporating regulatory logic into models of http://tibs.trends.com
metabolic networks has been applied to extreme pathway analysis [41]. Extreme pathways are eliminated when they violate regulatory rules (e.g. reaction A does not take place when metabolite M is present) associated with environment-specific conditions. Under environmentspecific conditions, between 67.5% and 97.5% of the extreme pathways were eliminated in a sample metabolic system, giving an environment-specific list of potential network functions. Recent work has also analyzed the correlation between gene expression changes that have been predicted from elementary modes and experimentally observed values that have been published [3]. For a model of central metabolism in E. coli, the predicted ratios of gene-expression changes with changing substrate were correlated to published results. This result provided evidence for the validity of structural analysis of metabolic networks, such as elementary modes and extreme pathways, for evaluating cellular functions beyond metabolic processing. A detailed kinetic model of the human red blood cell metabolic network has enabled a full, dynamic simulation of red blood cell function under various conditions such as environmental loads and enzyme deficiencies [42 – 45]. Thus, it has served as an important tool for investigating the biological applicability of pathway analysis. An initial elementary mode analysis of a scheme involving glucose metabolism, nucleotide metabolism, glutathione reduction
256
Review
TRENDS in Biochemical Sciences
and ATP consumption gave 21 elementary modes that could be grouped into six different overall reaction stoichiometries [29]. On the basis of a more complete metabolic model, the complete set of extreme pathways for metabolism in the human red blood cell has been enumerated, physiologically interpreted and classified based on their structural properties and functional capabilities [30]. It was demonstrated that the extreme pathways conservatively predict values that describe the maximal ATP: NADPH yield ratios the cell can sustain under a load, calculated from the kinetic model [30]. It was also shown that the nominal steady-state flux distribution in the red blood cell could be described by a few key extreme pathways. These important results show that, without any use of kinetic parameters (which are often subject to concerns of accuracy and availability), the metabolic function and ultimate capacity of a network can be largely captured by network-based pathways [46]. Applications to metabolic engineering In addition to the network properties that have been elucidated with pathway analysis, work has been done to illustrate the applicability of extreme pathways and elementary modes to metabolic engineering. In 1996, a study reported the calculation of the elementary modes for the central metabolic reactions of E. coli [31]. These elementary modes were then used to identify optimal and sub-optimal yields of aromatic amino acids from carbohydrate substrates. Here, yield is defined as the maximum molar ratio of product to substrate (e.g. from one glucose molecule, no more than two molecules of pyruvate can be made). A strain of E. coli was then engineered and achieved the predicted yield values. Whereas linear programming has been used to calculate optimal product yields in metabolic networks [47], pathway analysis has the advantage of determining alternate flux distributions with equivalent yields as well as pathways with nearoptimal yields that might have other desired characteristics such as useful byproducts [48]. This concept and its implications have also been discussed in relation to pathway redundancy [17]. The utility of elementary modes has been further validated in a study of the methylotroph, M. extorquens AM1 [33]. In this study, redundant C3 – C4 interconversion pathways were identified through an elementary mode analysis of a subset of the metabolic network. Genes that were calculated to be redundant were experimentally deleted and the resulting strains grew nominally, validating the in silico predictions. In another recent publication, elementary mode analysis was applied to a model of S. cerevisiae intermediary metabolism, based on experimental data from a strain of S. cerevisiae [35]. The elementary modes were then used to study the effects of gene additions and deletions on the yield values for production of poly-b-hydroxybutyrate in a recombinant strain. Recently, elementary modes were also used in a framework for assigning function to orphan genes using metabolomic data for a model of core metabolism in S. cerevisiae [34]. This concept could lead to further refinement of poorly characterized metabolic networks by http://tibs.trends.com
Vol.28 No.5 May 2003
characterizing the function of mutants with the gene in question deleted. Future challenges Analytical and computational Distinct challenges still remain in network-based pathway analysis. Most importantly are computational obstacles: the enumeration of a convex basis for a matrix is an NP-complete problem [49,50]; an algorithm that can solve this problem in polynomial time does not exist and, thus, it is computationally demanding to enumerate convex basis vectors for even small reaction networks. However, further advancements in theory (e.g. reformulation of extreme pathways and elementary modes into petri nets [8]) and implementation (e.g. parallelization of the extreme pathway algorithm [49]) might lead to the calculation of network-based pathways for larger metabolic networks. Methods have been developed to deal, in part, with the computational complexity of calculating network-based pathways. One approach to overcome this obstacle of computational intractability involves dividing the networks into subsystems. This subdivision has been done with both traditional groupings, such as nucleotide metabolism and lipid synthesis [15,16], and algorithmically defined groupings based on the local connectivity of metabolites [51]. However, groupings such as these neglect the combinatorial possibilities that exist because metabolites are shared between subsystems. In an attempt to address this problem, the elementary modes program METATOOL [38] replaces all enzyme subsets by their overall reaction. This approach effectively removes the reactions and metabolites within the subset and reduces the size of the problem for calculation purposes. In addition, this reduction has no effect on the pathways calculated, and the subsets in the solution can then be re-expanded to the original reactions. Biological application In addition to the computational and analytical challenges, many significant, open-ended questions remain regarding the biological application of extreme pathways and elementary modes. The incorporation of additional constraints, such as regulation, osmolarity, electroneutrality and thermodynamics [52,53], would expand the scope of network-based pathway analyses. Although some work has begun to evaluate the effects of transcriptional regulation with network-based pathways [3,41], this approach needs to be scaled-up to genome-scale biological systems. Kinetic parameters can provide additional constraints (e.g. Vmax values restrict the flux that is allowed through given reactions and consequently leads to a ‘smaller’ space defined by the extreme pathways [31]). Previous work with metabolic control analysis has demonstrated the combination of kinetic and structural analysis of metabolic networks [54]; however, this work describes pathways that are non-unique. Frameworks for incorporating kinetic parameters and other constraints into unique networkbased pathways need to be developed. How does kinetic regulation of individual enzymes relate to the regulation of systemic pathways? The incorporation of kinetic analyses into extreme pathways and elementary modes will be a
Review
TRENDS in Biochemical Sciences
Annotated genome
Biochemical data
Elucidation of systems properties
Genome-scale networks
Pathway analysis
Generation of biological hypotheses
Organism physiology Reconstruction
257
Vol.28 No.5 May 2003
Design of engineered biological systems Analysis
Application Ti BS
Fig. 5. Pathway analysis and biological research. Genome science has enabled the integration of knowledge from biochemical experiments and cell physiology into in silico models of cellular systems. With these models, network-based pathway analyses can lead to important applications and further scientific information.
significant advancement in biological systems analysis and will enable a more complete description of cellular function for which kinetics are a dominating influence. There is the additional challenge of extracting biological properties from large extreme pathway and elementary mode matrices. One recent approach involves the use of the SINGULAR VALUE DECOMPOSITION (SVD) [55]. From the set of extreme pathways, the SVD quantitatively characterizes the principal direction of the solution cone (see Fig. 2) and the primary trade-offs in a metabolic network. This principal direction is effectively the flux distribution that lies in the ‘center’ of all the available flux patterns in a metabolic network. The primary trade-offs characterized by SVD are the ratios of fluxes that describe how the network changes from one phenotype to another. Undoubtedly, the number of network-based pathways, compared with that of traditional pathways, is a deterrent for their use as a conceptual framework in biological research. However, the ability to use network-based pathways as a tool for mathematically describing metabolic networks and for quantifying objectives of metabolic engineering is a distinct and important advantage. Certainly, more work is needed before all of the biological information contained in these pathways can be deciphered in a manageable way. Concluding remarks Metabolic pathway analysis is an increasingly important driving force in current biochemical research, providing an important link between the reconstruction of biological models that assimilate the vast amounts of data in the post-genome era and the current demands and questions that are subsequently posed (Fig. 5). This review aims to put into perspective the history of pathway analysis, associated advantages and disadvantages of existing approaches, remaining challenges and the important biological characterizations that have been made to date. In addition, this review provides an impetus for the transition from historical and heuristic pathway definitions to unique, mathematically defined, network-based and unambiguous pathway definitions that fully characterize and quantitatively describe biochemical networks. Because we can now reconstruct large-scale reaction networks and we need to be able to analyze the associated biology of these systems, it is timely for a thorough http://tibs.trends.com
evaluation of metabolic pathway analyses. It is clear that although conventionally defined pathways will continue to be an important part of research and teaching in biology, mathematical frameworks are required to describe properties of large metabolic networks. In addition, these frameworks will certainly be an important foundation for the developing analyses of signaling and regulatory networks. Network-based pathway analysis will consequently help drive the development of discoveries in biochemical research. Acknowledgements We acknowledge the support of the National Science Foundation (BES 01 – 20363), the National Institutes of Health (GM 57089) and the Whitaker Foundation (Graduate Research Fellowship to JP). We also thank Timothy Allen, Derren Barken, Natalie Duarte, Steve Fong and Thuy Vo for critical readings of the article.
References 1 Covert, M.W. et al. (2001) Metabolic modeling of microbial strains in silico. Trends Biochem. Sci. 26, 179 – 186 2 Edwards, J.S. et al. (2002) Metabolic modelling of microbes: the fluxbalance approach. Environ. Microbiol. 4, 133 – 140 3 Stelling, J. et al. (2002) Metabolic network structure determines key aspects of functionality and regulation. Nature 420, 190– 193 4 Michal, G. (1999) Biochemical Pathways: An Atlas of Biochemistry and Molecular Biology, Wiley, Spektrum 5 Karp, P.D. et al. (1999) Integrated pathway-genome databases and their role in drug discovery. Trends Biotechnol. 17, 275– 281 6 Karp, P.D. (2001) Pathway databases: a case study in computational symbolic theories. Science 293, 2040– 2044 7 Ouzounis, C.A. and Karp, P.D. (2000) Global properties of the metabolic map of Escherichia coli. Genome Res. 10, 568 – 576 8 Oliveira, J.S. et al. (2001) An algebraic-combinatorial model for the identification and mapping of biochemical pathways. Bull. Math. Biol. 63, 1163 – 1196 9 Seressiotis, A. and Bailey, J.E. (1988) MPS: an artificially intelligent software system for the analysis and synthesis of metabolic pathways. Biotechnol. Bioeng. 31, 587– 602 10 Mavrovouniotis, M.L. et al. (1990) Computer-aided synthesis of biochemical pathways. Biotechnol. Bioeng. 36, 1119 – 1132 11 Schuster, S. and Hilgetag, C. (1994) On elementary flux modes in biochemical reaction systems at steady state. J. Biol. Syst. 2, 165 – 182 12 Schuster, S. et al. (2000) A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat. Biotechnol. 18, 326 – 332 13 Schilling, C.H. et al. (1999) Metabolic pathway analysis: basic concepts and scientific applications in the post-genomic era. Biotechnol. Prog. 15, 296 – 303 14 Schilling, C.H. et al. (2000) Theory for the systemic definition of
Review
258
15 16
17
18
19 20 21 22 23 24 25 26
27 28
29
30 31
32
33
34
35
TRENDS in Biochemical Sciences
metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. J. Theor. Biol. 203, 229 – 248 Schilling, C.H. et al. (2002) Genome-scale metabolic model of Helicobacterpylori 26695. J. Bacteriol. 184, 4582 – 4593 Schilling, C.H. and Palsson, B.O. (2000) Assessment of the metabolic capabilities of Haemophilus influenzae Rd through a genome-scale pathway analysis. J. Theor. Biol. 203, 249 – 283 Papin, J.A. et al. (2002) The genome-scale metabolic extreme pathway structure in Haemophilus influenzae shows significant network redundancy. J. Theor. Biol. 215, 67 – 82 Price, N.D. et al. (2002) Determination of redundancy and systems properties of Helicobacterpylori’s metabolic network using genomescale extreme pathway analysis. Genome Res. 12, 760– 769 Fell, D.A. (1992) Metabolic control analysis: a survey of its theoretical and experimental development. Biochem. J. 286, 313 – 330 Reder, C. (1988) Metabolic control theory: a structural approach. J. Theor. Biol. 135, 175 – 201 Heinrich, R. et al. (1977) Metabolic regulation and mathematical models. Prog. Biophys. Mol. Biol. 32, 1 – 82 Kacser, H. and Burns, J.A. (1973) The control of flux. Symp. Soc. Exper. Biol. 27, 65 – 104 Lay, D.C. (1997) Linear Algebra and Its Applications, Addison Wesley Longman Rockafellar, R.T. (1970) Convex Analysis, Princeton University Press Clarke, B.L. (1988) Stoichiometric network analysis. Cell Biophys. 12, 237 – 253 Clarke, B.L. (1980) Stability of complex reaction networks. In Advances in Chemical Physics (Rice, I.P.A.S.A., ed.), pp. 1 – 215, John Wiley, New York Stephanopoulos, G. et al. (1998) Metabolic Engineering, Academic Press Klamt, S. (2002) Calculability analysis in underdetermined metabolic networks illustrated by a model of the central metabolism in purple nonsulfur bacteria. Biotechnol. Bioeng. 77, 734– 751 Schuster, S. et al. (1998) Elementary mode analysis illustrated with human red cell metabolism. In BioThermoKinetics in the Post Genomic Era (Larsson, D. et al., eds), pp. 332 – 339, Chalmers Wiback, S.J. and Palsson, B.O. (2002) Extreme pathway analysis of human red blood cell metabolism. Biophys. J. 83, 808– 818 Liao, J.C. et al. (1996) Pathway analysis, engineering and physiological considerations for redirecting central metabolism. Biotechnol. Bioeng. 52, 129 – 140 Schilling, C.H. et al. (2000) Combining pathway analysis with flux balance analysis for the comprehensive study of metabolic systems. Biotechnol. Bioeng. 71, 286– 306 Van Dien, S.J. and Lidstrom, M.E. (2002) Stoichiometric model for evaluating the metabolic capabilities of the facultative methylotroph Methylobacterium extorquens AM1, with application to reconstruction of C(3) and C(4) metabolism. Biotechnol. Bioeng. 78, 296 – 312 Forster, J. et al. (2002) A functional genomics approach using metabolomics and in silico pathway analysis. Biotechnol. Bioeng. 79, 703 – 712 Carlson, R. et al. (2002) Metabolic pathway analysis of a recombinant yeast for rational strain development. Biotechnol. Bioeng. 79, 121– 134
http://tibs.trends.com
Vol.28 No.5 May 2003
36 Dandekar, T. et al. (1999) Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochem. J. 343, 115 – 124 37 Flores, S. et al. (2002) Analysis of carbon metabolism in Escherichia coli strains with an inactive phosphotransferase system by (13)C labeling and NMR spectroscopy. Metab. Eng. 4, 124– 137 38 Pfeiffer, T. et al. (1999) METATOOL: for studying metabolic networks. Bioinformatics 15, 251– 257 39 Papin, J.A. et al. (2002) Extreme pathway lengths and reaction participation in genome-scale metabolic networks. Genome Res. 12, 1889– 1900 40 Schuster, S. et al. (2002) Use of network analysis of metaoblic systems in bioengineering. Bioprocess Biosyst. Eng. 24, 363– 372 41 Covert, M.W. and Palsson, B.O. (2003) Constraints-based models: regulation of gene expression reduces the steady-state solution space. J. Theor. Biol. 221, 309 – 325 42 Jamshidi, N. et al. (2001) Dynamic simulation of the human red blood cell metabolic network. Bioinformatics 17, 286 – 287 43 Joshi, A. and Palsson, B.O. (1989) Metabolic dynamics in the human red cell. Part I– A comprehensive kinetic model. J. Theor. Biol. 141, 515– 528 44 Mulquiney, P.J. and Kuchel, P.W. (1999) Model of 2-3-bisphosphoglycerate metabolism in the human erythrocyte based on detailed kinetic equations: computer simulation and metabolic control analysis. Biochem. J. 342, 597 – 604 45 Rapoport, T.A. and Heinrich, R. (1975) Mathematical analysis of multienzyme systems. I: modelling of glycolysis of human erythrocytes. Biosystems 7, 120 – 129 46 Bailey, J.E. (2001) Complex biology with no parameters. Nat. Biotechnol. 19, 503 – 504 47 Edwards, J.S. et al. (1999) Metabolic flux balance analysis. Metabolic Engineering (Lee, S.Y., Papoutsakis, E.T., et al. eds), Marcel Deker 48 Schuster, S. et al. (1999) Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol. 17, 53 – 60 49 Samatova, N.F. et al. (2002) Parallel out-of-core algorithm for genomescale enumeration of metabolic systematic pathways Proceedings of the First IEEE Workshop on High Performance Computational Biology (HiCOMB2002) 50 Garey, M.R. and Johnson, D.S. (1983) Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman 51 Schuster, S. et al. (2002) Exploring the pathway structure of metabolism: decomposition into subnetworks and application to Mycoplasma pneumoniae. Bioinformatics 18, 351 – 361 52 Price, N.D. et al. (2002) Extreme pathways and Kirchhoff ’s second law. Biophys. J. 83, 2879– 2882 53 Palsson, B.O. (2000) The challenges of in silico biology. Nat. Biotechnol. 18, 1147 – 1150 54 Heinrich, R. and Schuster, S. (1996) The Regulation of Cellular Systems, Chapman and Hall 55 Price, N.D. et al. (2003) Analysis of metabolic capabilities using singular value decomposition of extreme pathway matrices. Biophys. J. 84, 794– 804