Theoretical analysis of amino acid-producing Escherichia coli using a stoichiometric model and multivariate linear regression

Theoretical analysis of amino acid-producing Escherichia coli using a stoichiometric model and multivariate linear regression

JOURNAL OF BIOSCIENCE AND BIOENGINEERING Vol. 102, No. 1, 34–40. 2006 DOI: 10.1263/jbb.102.34 © 2006, The Society for Biotechnology, Japan Theoretic...

774KB Sizes 4 Downloads 35 Views

JOURNAL OF BIOSCIENCE AND BIOENGINEERING Vol. 102, No. 1, 34–40. 2006 DOI: 10.1263/jbb.102.34

© 2006, The Society for Biotechnology, Japan

Theoretical Analysis of Amino Acid-Producing Escherichia coli Using a Stoichiometric Model and Multivariate Linear Regression Stephen J. Van Dien,1§* Shintaro Iwatani,1 Yoshihiro Usuda,1 and Kazuhiko Matsui1 Functional Genomics Group, Institute of Life Sciences, Ajinomoto Co., Inc., 1-1 Suzuki-cho, Kawasaki-ku, Kawasaki 210-8681, Japan1 Received 27 June 2005/Accepted 7 April 2006

This work demonstrates a novel computational approach combining flux balance modeling with statistical methods to identify correlations among fluxes in a metabolic network, providing insight as to how the fluxes should be redirected to achieve maximum product yield. The procedure is demonstrated using the example of amino acid production from an industrial Escherichia coli production strain and a hypothetical engineered strain overexpressing two heterologous genes. Regression analysis based on a random sampling of 5000 points within the feasible solution space of the E. coli stoichiometric network suggested that increased activity of the glyoxylate cycle or PEP carboxylase and elimination of malic enzyme will improve lysine and arginine synthesis. [Key words: metabolic flux analysis, lysine, multivariate regression, Escherichia coli]

Metabolic flux analysis (MFA), also known as constraintbased analysis or stoichiometric modeling, is a technique using linear optimization to estimate intracellular metabolic reaction rates (fluxes) based upon only the underlying stoichiometry of reactions, metabolic demands, and any known system constraints. Due to their simplicity in structure, metabolic flux models are useful in the study of entire metabolic networks, for which the development of detailed kinetic models is limited by both data availability and computational power. This technique has become an important tool for studying the capabilities of biochemical reaction systems in microorganisms, and for predicting the intracellular metabolic flux distributions under different environmental conditions (1–3). A relevant example is given by Vallino and Stephanopoulos, who have used such models to aid in the metabolic engineering of lysine production in Corynebacterium glutamicum (4). In addition to C. glutamicum, Escherichia coli is one of the most important industrial strains for production of amino acids (5). Stoichiometric models have also been constructed for E. coli (6–8), but never applied specifically to amino acid-producing strains. In practice, MFA involves writing a mass balance on each metabolite in the network, applying a steady-state assumption, and solving the resulting system of equations. Since there are usually more unknown reaction rates (metabolic fluxes) than metabolites, due to the presence of cycles and alternative pathways, the system of equations is underdetermined and thus does not have a unique solution. A common method to obtain a unique flux distribution in such a case is to define an objective function, such as biomass production,

and use linear programming to find the optimal solution with respect to this objective function (1, 6). Much effort has been spent recently determining optimal flux distributions of wildtype or mutant E. coli by the optimization of objective functions (9), calculation of elementary modes (10), or the analysis of phase planes (7, 11, 12). Although these methods have been extremely successful in the analysis of comprehensive metabolic networks, comparatively little attention has been paid to non-optimal solutions. Such solutions correspond more closely to actual mutants or engineered strains for industrial process since they have not been given sufficient time to evolve to optimality. One of the goals of metabolic engineering is to modify the flux distributions of current process strains so that they approach optimality. In this work we apply statistical techniques to determine the relationship between key fluxes and lysine production (as well as two other exemplary amino acid products), and make predictions on how to engineer an improved lysine production strain. MATERIALS AND METHODS Construction of the stoichiometric model An overall flux balance was developed using the well-developed principles of stoichiometric analysis with application of the pseudo-steady state hypothesis to the intracellular metabolites (4, 13). The reactions included in this model are listed in the Appendix, and include all of central carbon metabolism and the amino acid biosynthetic pathways of E. coli. These pathways have all been characterized enzymatically and genetically, and are readily available from various sources (14). Biomass composition was assumed to be equal to that previously measured for E. coli at high growth rates (6, 15). Minor changes were made to eliminate futile cycles: PEP carboxylase was removed, and malic enzyme was forced to be irreversible. Instead, all futile cycling was lumped into a single ATPase reaction, to represent conversion of all excess ATP to ADP. The stoichio-

* Corresponding author. e-mail: [email protected] phone: +1-858-824-1771 fax: +1-858-824-1772 § Present address: Genomatica, Inc., 5405 Morehouse Drive, Suite 210, San Diego, CA 92121, USA. 34

VOL. 102, 2006

STOICHIOMETRIC MODELING WITH MULTIVARIATE REGRESSION

TABLE 1. List of free fluxes used to generate the set of random flux distributions Reaction number 2 15 16 60 62 64 66

Enzyme or pathway name Glucose-6-phosphate dehydrogenase PEP carboxylase Acetate secretion Isocitrate lyase (glyoxylate cycle) Malic enzyme Formate secretion ATPase

metric matrix was constructed using the Fluxanalyzer program (10, 16), which operates within the Matlab environment (The Mathworks, Natick, MA, USA). The resulting matrix could then be saved as a file, for use with other Matlab functions outside of Fluxanalyzer. Calculation of elementary modes Elementary modes were calculated using Fluxanalyzer, and normalized to a basis of 10 mmol glucose uptake. The matrix of elementary modes was saved as a file, and a simple Matlab function was used to sort them in decreasing order of either biomass yield, lysine production, or overall carbon yield. Carbon yield is defined as the total number of carbon atoms in both biomass and lysine. Principal components analysis and multivariate regression Singular groups of fluxes were determined (17), and one flux from each group was chosen to give seven free fluxes (Table 1). Specification of all seven free fluxes provides a unique solution to the flux balance. A data set of 5000 different metabolic flux distributions was generated by choosing random values of the free fluxes within specified bounds, all for a basis of 10 mmol glucose uptake. Only flux distributions that did not violate any of the irreversibility constraints were allowed, and those sets not producing either lysine or biomass above a threshold level of 20% maximum were excluded. The result was a matrix with 68 columns, each corresponding to a reaction flux, and 5000 rows, each corresponding to a different random distribution. Principle components analysis (18) was performed on the Z-scores of the matrix columns, to determine the few linear combinations of fluxes that explain most of the variation between the different distributions. Thus, it helps to cluster fluxes that behave similarly. Z-scores are defined as the deviation of each column from the mean, normalized by the standard deviation. Multivariate linear regression was then performed on a reduced matrix, containing the Z-scores of only the columns corresponding to the seven free fluxes. The stepwise regression function in the Matlab statistical toolbox was used to find the best fit correlation function between the free fluxes and either biomass yield or lysine production.

RESULTS AND DISCUSSION Calculation of elementary modes and optimal lysine yield The stoichiometric model of E. coli central metabolism was constructed and analyzed as described in Materials and Methods. Elementary modes were calculated using Fluxanalyzer, and normalized to a basis of 10 mmol glucose uptake. A total of 385 elementary modes were found, of which 283 result in biomass production and 133 have lysine secretion. Fifty-six have both lysine secretion and cell growth. The network appears to be highly redundant, as over 30 elementary modes are within 1% of optimum biomass yield, and seven have lysine production within 5% of maximum. The theoretical maximum biomass yield is 1.179 g DCW per 10 mmoles glucose feed (81.8% carbon yield), and can occur either with or without utilization of either the oxi-

35

dative pentose phosphate or glyoxylate pathways. None of the highest biomass-producing elementary modes predict consumption of excess ATP through futile cycles, suggesting that cell growth is energy-limited. There is predicted to be an excess of NADH produced by central metabolism, so that NADH is converted both to ATP, by the electron transport chain, and to NADPH by the transhydrogenase. When the transhydrogenase is removed from the network, the maximum predicted yield drops slightly, and there is an absolute requirement for the oxidative pentose phosphate pathway in this case (data not shown). The maximum theoretical lysine production is 8.40 mmol/ 10 mmol glucose (84.0% carbon yield). The glyoxylate and/or TCA cycles operate just enough to produce the required NADH, and all of the remaining carbon is directed exclusively to lysine. Neither the glyoxylate nor oxidative pentose phosphate pathways are predicted to be necessary for lysine production within 1% of maximum. However, the transhydrogenase appears to be necessary, even when the pentose phosphate pathway is operating. Removal of this enzyme results in a drop in predicted yield to 69.2%. Simulation of recombinant strains Two genes present in the C. glutamicum genome (19, 20), but not in that of E. coli, are those encoding the enzymes meso-diaminopimerate dehydrogenase (DDH) and pyruvate carboxylase. The first of these catalyzes a bypass of the lysine synthesis reactions nos. 30 and 31, and is more energy efficient because it does not involve the conversion of succinyl-CoA to succinate (21). Pyruvate carboxylase converts pyruvate to oxaloacetate at the expense of one ATP. In wild-type E. coli this net conversion, via PEP synthase and PEP carboxylase, requires transfer of two high-energy phosphate groups (22). These pathways have been reported to influence lysine production for C. glutamicum (23, 24), so they are obvious targets for recombinant expression in lysine-producing E. coli. Such an engineered strain was simulated by augmenting the stoichiometric matrix with these two reactions (nos. 69 and 70), and repeating the elementary mode calculations described above. The addition of just these two reactions increased the number of elementary modes greatly, to 1768. The maximum predicted biomass yield increased only slightly, from 1.007 to 1.009 g DCW/10 mmoles glucose, due to the increased efficiency of synthesizing proteinogenic lysine. The maximum lysine-producing elementary mode increased more significantly, from 8.33 to 8.57 mmoles/10 mmoles glucose. This elementary mode utilizes DDH to produce all lysine, and also requires pyruvate carboxylase. Principal components analysis A dataset of 5000 potential flux distributions was created by choosing random values for seven free fluxes, belonging to different singular flux groups (17), within specified bounds for a basis of 10 mmol glucose uptake (Table 1). Principal components analysis (PCA) was applied to cluster the various metabolic fluxes in the network to a fewer number of factors. In essence, this technique transforms data of 68 dimensions onto a coordinate system of much fewer dimensions, without significant loss of information (25). Each axis in the new coordinate system is called a principal component (referred to here as PC no. 1, PC no. 2, etc.) and is a specific linear combination of the original variables, which in this case are the

36

VAN DIEN ET AL.

J. BIOSCI. BIOENG.,

those fluxes producing NAD(P)H, as well as those responsible for the conversion of NADH to ATP, and ATP consumption. It has negative values for those fluxes involved in amino acid or biomass synthesis. Finally, PC no. 3 has positive values for the pentose phosphate pathway fluxes, negative values for glycolytic fluxes, and essentially zero for the others (data not shown). A scatter plot showing the location of each reaction flux projected into the coordinate space of the first two principal components is shown in Fig. 2. Multivariate regression analysis Next, multivariate correlation analysis was performed on the same 5000 point data set, using the stepwise regression function in the Matlab statistical toolbox. The purpose of this technique is to derive an equation giving either biomass or lysine production as a linear function of the seven free fluxes. Since specification of these seven fluxes uniquely defines the state of the system, if we use all seven terms the correlation coefficient R2 is 1, meaning that the fit is perfect. However, as in the case of PCA, it is often possible to obtain a relatively good fit using just a few of the terms in the equation. Biomass was fit to R2 = 0.978 with just four terms: isocitrate lyase, malic enzyme, PEP carboxylase, and ATPase. No models with fewer terms give a reasonable fit. The regression equation is as follows, where the inputs are the reaction fluxes normalized per 10 mmol glucose: Biomass = 2.051 − 0.247 (ICL) + 0.247 (ME) − 0.252 (PEP carb) − 0.0069 (ATPase) (1) Lysine can be fit using a model containing the same four parameters, and the resulting R2 is 0.996; however, removal of the ATPase only causes a drop in R2 to 0.951, which is still very good. Thus we used the following 3-parameter model for lysine: FIG. 1. Loading plots for principal components nos. 1 (a) and 2 (b). The x-axis indicates the reaction number, as defined in the Appendix. Length of each bar represents the coefficient for that particular reaction in the equation defining the principal component.

68 reaction fluxes. The eigenvalues of the principal components indicate their relative importance, and are always in descending order, with the first PC being the most significant. When the eigenvalues are normalized so that they sum to unity, those corresponding to the first seven PC’s are as follows: 0.449, 0.240, 0.165, 0.063, 0.0425, 0.023, and 0.017. The remaining PC’s sum to less than 0.001, indicating that these first seven components account for 99.9% of the variation between the different flux distributions. The first three PC’s together account for 85.4% of the variation in the data, so we focused on these. Figure 1 is a bar graph showing the component loading; i.e., the contribution of each reaction to the first two principal components. This represents the vector of coefficients by which the flux vector is multiplied by to obtain the value of that particular principal component. PC no. 1 has positive coefficients only for biomass production and the synthetic pathways leading to biomass precursors, indicating that it is essentially a measure of cell growth. In fact, biomass correlates directly with PC no. 1 (R2 = 0.99). PC no. 2 has large positive values for

Lysine = −4.293 + 1.671 (ICL) − 1.545 (ME) + 1.590 (PEP carb)

(2)

In general, this technique shows how a change in one free flux effects the cell output, given that all other free fluxes are held constant. The results indicate that biomass yield is positively correlated with malic enzyme, whereas lysine production is positively correlated with PEP carboxylase and isocitrate lyase (glyoxylate cycle) fluxes. Either of these reactions, as well as reactions directly coupled to them (such as malate synthase, for example) could theoretically represent a bottleneck to production. The model tells us that increasing the flux through these pathways is likely to increase yield. However, it is not necessarily true that overexpressing any of the enzymes encoding these pathways will lead to an increased flux. The choice for genetic manipulation rests on which step in the group of coupled reactions has the lowest capacity (i.e., enzyme activity) in the existing strain. The utility of regression analysis is demonstrated in Figs. 3 and 4. When considered separately, no correlation can be seen between lysine production and either the isocitrate lyase or malic enzyme fluxes (Fig. 3a, b), due to interference by changes in other free fluxes. However, when considered as part of the regression Eq. 2, their effect is very clear (Fig. 4). Therefore, this technique has been useful in identifying hidden relationships among the metabolic fluxes. In this case, we claim with 85.6% confidence that any ge-

VOL. 102, 2006

STOICHIOMETRIC MODELING WITH MULTIVARIATE REGRESSION

37

FIG. 2. The information from Fig. 1 represented as a scatter plot, with the axes being the first two principal components. Points are given labels to indicate to which pathway or portion of metabolism they belong. The fluxes chosen as free fluxes for the statistical analysis (see text) are labeled with boxes.

netic manipulations resulting in an increased value of Eq. 2 will have a positive effect on lysine yield. Therefore, isocitrate lyase and malic enzyme represent obvious targets for future strain development. Some discussion is warranted as to how the seven free fluxes were chosen. Clearly, these fluxes cannot be chosen arbitrarily, since relationships between the flux values would result in redundant information in some cases. Instead, we determined seven groups of reaction fluxes so that any flux in one group can be varied without any effect on the fluxes within another group (17). Then, one flux from each group was taken as the free flux. Within each group the fluxes could be chosen arbitrarily, but we chose those near branch points, as they would be more obvious targets for metabolic engineering efforts. For example, we chose glucose-6-phosphate dehydrogenase as a free flux, but it would have been equally valid to choose transketolase. When knowledge is available on which enzymes are most easily manipulated (i.e., in an engineered strain), such reactions should be included in the set of free fluxes. Regardless of the choice of free fluxes, not all random combinations of these parameters produce biologically meaningful flux distributions. In fact, on average less than 1% of the chosen parameter distributions were acceptable. Thus the population size of 5000 actually represents a sampling of over 500,000 points in the 7-dimensional parameter space. To demonstrate the application of this technique to other amino acids, the regression analysis was repeated for arginine and tryptophan-producing strains. The result for arginine was similar to that of lysine, although the fit to the

3-parameter model was not quite as good: R2 = 0.913. The regression equation is as follows: Arginine = −5.290 + 1.995 (ICL) − 1.878 (ME) + 1.942 (PEP carb)

(3)

The behavior of tryptophan yield was qualitatively more similar to that of biomass than to the other two amino acids. Four parameters were necessary to provide a reasonable fit (R2 = 0.928), and each term has the same sign as in the biomass equation: Tryptophan = 5.231 − 1.393 (ICL) + 1.388 (ME) − 1.420 (PEP carb) − 0.017 (ATPase) (4) Finally, multivariate regression was applied to the recombinant strain. The addition of two new enzymes and no new metabolites to the network increased the degree of freedom by two; therefore, two additional free fluxes were defined. Logical choices for these fluxes were the new pathways, DDH (rx. 69) and pyruvate carboxylase (rx. 70). Surprisingly, pyruvate carboxylase but not DDH entered into the regression equations for biomass (R2 = 0.967) and lysine (R2 = 0.926), given below. Biomass = 2.013 − 0.240 (ICL) + 0.237 (ME) − 0.244 (PEP carb) − 0.0067 (ATPase) − 0.244 (pyruvate carb) (5) Lysine = −3.738 + 1.592 (ICL) − 1.459 (ME) + 1.485 (PEP carb) + 1.470 (pyruvate carb) (6)

38

J. BIOSCI. BIOENG.,

VAN DIEN ET AL.

FIG. 4. Plot of lysine production as a function of the value of Eq. 2 for the random dataset of 5000 flux distributions. Inputs to the equation are fluxes given in mmol/h per 10 mmol/h glucose flux.

FIG. 3. Plots showing lysine production yield as a function of different values of free fluxes using a random dataset of 5000 flux distributions. (a) Isocitrate lyase flux; (b) malic enzyme flux; (c) PEP carboxylase flux.

The fact that only pyruvate carboxylase contributed significantly to the regression equation for lysine production is surprising because the other enzyme, DDH, contributed significantly to the optimum and near-optimum elementary modes, and when added alone resulted in increased predicted maximum lysine yield (data not shown). The reason for this apparent contradiction is that the regression analysis involves all potential modes of metabolism, not just those that are very close to optimum. Most of these flux distribu-

tions have more significant energy inefficiencies than the synthesis of SDAP, and thus the addition of DDH has little effect. In agreement with this prediction is the recent observation that increased expression of ddh in C. glutamicum under normal process conditions had no effect on lysine yield (26). This example clearly demonstrates the difference in the two types of analysis here, and that a thorough investigation requires not only an examination of the elementary modes or extreme vectors, but also the interior of the flux space. Recently, stoichiometric modeling has been used to predict genetic alterations in organisms leading to improved growth or product yield (27–30), but the methodologies applied in those cases differ significantly from those described here. First of all, these reports describe either addition of a recombinant pathway, deletion of genes, or a selection of non-native genes from a universal reaction database. In contrast, our methods can be applied to (but not necessarily limited to) the increase in expression levels of genes already present in the host strain. More importantly, these previous studies only considered optimum levels of production for each given reaction set. However, it is likely that industrial strains are not operating at such points, because they have not had sufficient time to evolve toward maximum efficiency. By applying the statistical analysis procedures on a random sample of all flux space, we obtain a more general model that is applicable to non-extreme cases. In particular, through the use of the regression function, predictions can be made as to how fluxes should be manipulated in order to more closely approach the optimal solutions. To our knowledge, there have been no previous reports in the literature where multivariate regression has been applied to results from a metabolic flux model. We believe that the combination of statistical techniques such as this with stoichiometric modeling has a potential powerful application to the metabolic engineering of improved amino acid producing strains. APPENDIX List of reactions [1] Glc + PEP → G6P + Pyr

VOL. 102, 2006

STOICHIOMETRIC MODELING WITH MULTIVARIATE REGRESSION

G6P + 2NADP → Ribu5P + 2NADPH + CO2 Ribu5P → R5P Ribu5P → X5P X5P + R5P → Sed7P + GAP Sed7P + GAP → E4P + F6P X5P + E4P → F6P + GAP G6P → F6P F6P + ATP → FBP + ADP FBP → 2AP GAP + NAD + ADP → 3PG + NADH + ATP 3PG → PEP PEP + ADP → Pyr + ATP Pyr + NAD + CoA → AcCoA + NADH + CO2 PEP+CO2 → OAA AcCoA + ADP → AcOH + ATP + CoA AcCoA + OAA → Cit + CoA Cit → Isocit Isocit + NADP → αKG + NADPH + CO2 aKG + NADPH + NH3 → Glu + NADP aKG + NAD + CoA → SucCoA + NADH + CO2 SucCoA + ADP → Suc + ATP + CoA Suc + FAD → Fum + FADH Fum → Mal Mal + NAD → OAA + NADH OAA + Glu → Asp + αKG Asp + ATP + NADPH → ASA + ADP + NADP ASA + Pyr → DDP DDP + NADPH → THDP + NADP THDP + SucCoA + Glu → SDAP + αKG + CoA SDAP → mDAP + Suc mDAP → Lys + CO2 Glu + ATP + NH3 → Gln + ADP Glu + 2NADPH + ATP → Pro + 2NADP + ADP Glu + 5ATP + NADPH + Gln + Asp + AcCoA + CO2 → Arg + 5ADP + NADP + αKG + Fum [36] ASA + NADPH → Hse + NADP [37] Hse + SucCoA + Cys + mTHF → Met + Suc + CoA + THF + Pyr + NH3 [38] Hse + ATP → Thr + ADP [39] Thr + Glu + NADPH + Pyr → Ile + αKG + NADP + NH3 + CO2 [40] r 3PG → Ser [41] r Ser + THF → Gly + mTHF [42] r PEP + E4P + NADPH → SKA + NADP [43] CHR → PPA [44] PPA + NAD + Glu → Tyr + NADH + CO2 + Akg [45] PPA + Glu → Phe + CO2 + αKG [46] CHR + R5P + 2ATP + Gln → Ind + Glu + Pyr + CO2 + GAP + 2ADP [47] 2Pyr → ALC [48] aIVA + Glu → Val + aKG [49] Val + Pyr → ALA + aIVA [50] αIVA + AcCoA + NAD + Glu → Leu + NADH + CO2 + αKG + CoA [51] PRPP + ATP + Gln + Glu + 2NAD → His + ADP + Glu + αKG + 2NADH [52] Ser + AcCoA + H2S → Cys + AcOH [53] SKA + PEP + ATP → CHR + ADP [54] Ind + Ser → Trp [55] ALC + NADPH → αIVA + NADP + CO2 [2] [3] r [4] r [5] r [6] r [7] r [8] r [9] r [10] r [11] r [12] r [13] [14] [15] [16] [17] [18] r [19] r [20] [21] [22] r [23] r [24] r [25] r [26] [27] [28] [29] [30] [31] [32] [33] r [34] [35]

[56] r [57] [58] [59] r [60] [61] [62] [63] r [64] [65] [66] [67] [68] [69]* [70]*

39

NADH → NADPH 2NADH + O2 + 2ADP → 5ATP + 2NAD 2FADH + O2 + 3ADP → 3ATP + 2FAD Asp + 2ATP + NH3 → Asn + 2ADP Isocit → Glyox + Succ AcCoA + Glyox → Mal + CoA Mal + NAD → Pyr+CO2 + NADH R5P + 2ATP → PRPP + 2ADP mTHF + NADP → NADPH + THF + Form NAD + Gly+THF → mTHF + NADH + CO2 + NH3 ATP → ADP Lys → Lysext Biomass synthesis (see below) THDP + NADPH + NH3 → mDAP + NADP Pyr + ATP + CO2 → OAA + ADP

r, Reversible reaction. * In recombinant strain simulation only (see text). RNA (21.33%): 3.47 PRPP + 5.02 Gln + −5.02 Glu + 3.08 Gly + 6.17 Asp + 32.41 ATP + −32.41 ADP + 6.17 mTHF + −6.17 THF + 3.09 NAD + −3.09 NADH + 6.17 NADP + −6.17 NADPH + 1.16 CO2 + −3.47 Fum + −3.86 NH3. DNA (3.23%): 3.37 PRPP + 4.88 Gln + −4.88 Glu + 3 Gly + 6 Asp + 31.5 ATP + −31.5 ADP + 7.12 mTHF + −7.12 THF + 3 NAD + −3 NADH + 3.75 NADP + −3.75 NADPH + 1.12 CO2 + −3.37 Fum + −3.75 NH3. Phospholipid (9.47%): 20.8 AcCoA + −20.8 CoA + 1.95 GAP + 0.65 Ser + 44.2 ATP + −44.2 ADP + 38.35 NADH + −38.35 NAD + −0.65 CO2. Peptidoglycan (2.60%): 1.94 F6P + 1.94 AcCoA + −1.94 CoA + 1.94 Gln + −1.94 Glu + 2.91 Ala + 0.97 PEP + 0.97 Lys + 6.97 ATP + −6.97 ADP + 0.97 NADPH + −0.97 NADP + −0.97 CO2. LPS (3.54%): 0.91 R5P + 0.91 F6P + 0.91 PEP + 15.47 AcCoA + −0.91 AcOH + −0.91 Glu + 0.91 Gln + 32.76 ATP + 12.74 NADH. Protein (57.23%): 0.77 Gly + 0.96 Ala + 0.67 Val + 0.85 Leu + 0.44 Ile + 0.44 Ser + 0.48 Thr + 0.30 Phe + 0.26 Tyr + 0.01 Trp + 0.15 Cys + 0.22 Met + 0.54 Lys + 0.46 Arg + 0.16 His + 0.46 Asp + 0.52 Glu + 0.46 Asn + 0.52 Gln + 0.34 Pro. Glycogen (2.60%): F6P + ATP.

Abbreviation of metabolites 3PG : 3-phospho-D-glycerate AcCoA : acetylcoenzyme-A AcOH : acetate αIVA : alpha-keto-isovaleric acid αKG : 2-oxoglutaric acid ALC : acetohydroxy acid ASA : aspartate semialdehyde CHR : chorismic acid Cit : citric acid DDP : dihydrodipicolinate E4P : erythrose-4-phosphate FBP : fructose bisphosphate Form : formate Fum : fumerate GAP : glyceraldehyde-phosphate Glyox : glyoxylate Hse : homoserine Ind : indole glycerol phosphate Isocit : isocitrate

40

J. BIOSCI. BIOENG.,

VAN DIEN ET AL.

Mal : mDAP : mTHF : PPA : PRPP : Pyr : R5P : Ribu5P : SDAP : Sed7P : SKA : Suc : SucCoA : THDP : THF : X5P :

malic acid meso-diaminopimelate methyl-tetrahydrofolate prephenate phosphoribosyl pyrophosphate pyruvic acid ribose-5-phosphate ribulose-5-phosphate n-succinyl-L-2,6-diaminoheptanedioate D-sedoheptulose-7-phosphate shikimate succinic acid succinyl-coenzyme A tetrahydrodipicolinate tetrahydrofolate xylulose-5-phosphate

Commonly accepted abbreviations such as amino acids not listed.

REFERENCES 1. Varma, A. and Palsson, B. O.: Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild type Escherichia coli. Appl. Environ. Microbiol., 60, 3724–3731 (1994). 2. Schilling, C. H., Edwards, J. S., and Palsson, B. O.: Toward metabolic phenomics: analysis of genomic data using flux balances. Biotechnol. Prog., 15, 288–295 (1999). 3. Schilling, C. H., Schuster, S., Palsson, B. O., and Heinrich, R.: Metabolic pathway analysis: basic concepts and scientific applications in the post-genomic era. Biotechnol. Prog., 15, 296–303 (1999). 4. Vallino, J. J. and Stephanopoulos, G.: Metabolic fluc distributions in Corynebacterium glutamicum during growth and lysine overproduction. Biotechnol. Bioeng., 41, 633–646 (1993). 5. Ikeda, M.: Amino acid production processes. Adv. Biochem. Eng. Biotechnol., 79, 1–35 (2003). 6. Pramanik, J. and Keasling, J. D.: Stoichiometric model of Escherichia coli metabolism: incorporation of growth-rate dependent biomass composition and mechanistic energy requirements. Biotechnol. Bioeng., 56, 398–421 (1997). 7. Ibarra, R. U., Edwards, J. S., and Palsson, B. O.: Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature, 420, 186–189 (2002). 8. Reed, J. L., Vo, T. D., Schilling, C. H., and Palsson, B. O.: An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol., 4, R54 (2003). 9. Segre, D., Vitkup, D., and Church, G. M.: Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl. Acad. Sci. USA, 99, 15112–15117 (2002). 10. Stelling, J., Klamt, S., Bettenbrock, K., Schuster, S., and Gilles, E. D.: Metabolic network structure determines key aspects of functionality and regulation. Nature, 420, 190–193 (2002). 11. Edwards, J. S. and Palsson, B. O.: The Escherichia coli MG 1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc. Natl. Acad. Sci. USA, 97, 5528– 5533 (2000). 12. Edwards, J. S., Ibarra, R. U., and Palsson, B. O.: In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat. Biotechnol., 19, 125–130 (2001). 13. Savinell, J. M. and Palsson, B. O.: Network analysis of in-

14.

15. 16.

17. 18. 19. 20.

21.

22.

23.

24.

25. 26.

27. 28.

29.

30.

termediary metabolism using linear optimization. I. Development of mathematical formalism. J. Theor. Biol., 154, 421– 454 (1992). Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Paley, S., and Pellegrini-Toole, A.: EcoCyc: electronic encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res., 28, 56 (2000). Neidhardt, F. C., Ingraham, J. L., and Schaechter, M.: Physiology of the bacterial cell. Sinauer Associates, Sunderland, MA (1990). Klamt, S., Stelling, J., Ginkel, M., and Gilles, E. D.: FluxAnalyzer: exploring structure, pathways, and flux distributions in metabolic networks on interactive flux maps. Bioinformatics, 19, 261–269 (2003). Stephanopoulos, G. N., Aristidou, A. A., and Nielsen, J.: Metabolic engineering: principles and methodologies. Academic Press, San Diego (1998). Anderson, T. W.: Introduction to multivariate statistical analysis. Wiley, New York (1984). Ikeda, M. and Nakagawa, S.: The Corynebacterium glutamicum genome: features and impacts on biotechnological processes. Appl. Microbiol. Biotechnol., 62, 99–109 (2003). Kalinowski, J., Bathe, B., Bartels, D., Bischoff, N., Bott, M., Burkovski, A., Dusch, N., Eggeling, L., Eikmanns, B. J., Gaigalat, L., and other 17 authors: The complete Corynebacterium glutamicum ATCC 13032 genome sequence and its impact on the production of L-aspartate-derived amino acids and vitamins. J. Biotechnol., 104, 5–25 (2003). Misono, H., Togawa, H., Yamamoto, T., and Soda, K.: Meso-alpha,epsilon-diaminopimelate D-dehydrogenase: distribution and the reaction product. J. Bacteriol., 137, 22–27 (1979). Peters-Wendisch, P. G., Kreutzer, C., Kalinowski, J., Patek, M., Sahm, H., and Eikmanns, B. J.: Pyruvate carboxylase from Corynebacterium glutamicum: characterization, expression, and inactivation of the pyc gene. Microbiology, 144, 915–927 (1998). Park, S. M., Shaw-Reid, C., Sinskey, A. J., and Stephanopoulos, G.: Elucidation of anaplerotic pathways in Corynebacterium glutamicum via 13C-NMR spectroscopy and GC-MS. Appl. Microbiol. Biotechnol., 47, 430–440 (1997). Sonntag, K., Eggeling, L., de Graaf, A. A., and Sahm, H.: Flux partitioning in the split pathway of lysine synthesis in Corynebacterium glutamicum. Quantification by 13C- and 1 H-NMR spectroscopy. Eur. J. Biochem., 213, 1325–1331 (1993). Kachigan, S. K.: Multivariate statistical analysis: a conceptual introduction. Radius Press, New York (1982). Shaw-Reid, C., McCormick, M. M., Sinskey, A. J., and Stephanopoulos, G.: Flux through the tetrahydrodipicolate succinylase pathway is dispensable for L-lysine production in Corynebacterium glutamicum. Appl. Microbiol. Biotechnol., 51, 325–333 (1999). Carlson, R., Fell, D., and Srienc, F.: Metabolic pathway analysis of a recombinant yeast for rational strain development. Biotechnol. Bioeng., 79, 121–134 (2002). Burgard, A. P. and Maranas, C. D.: Probing the performance limits of the Escherichia coli metabolic network subject to gene additions or deletions. Biotechnol. Bioeng., 74, 364–375 (2001). Burgard, A. P., Pharkya, P., and Maranas, C. D.: Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol. Bioeng., 84, 647–657 (2003). Pharkya, P., Burgard, A. P., and Maranas, C. D.: OptStrain: a computational framework for redesign of microbial production systems. Genome Res., 14, 2367–2376 (2004).