Copyright © IFAC Computer Applications in Biotechnology, Osaka, Japan, 1998
STATISTICAL ANALYSIS OF CARBON LABELING DATA Wolfgang Wiechert, • Michael Mollney, • Dirk Kownatzki •
• Abteilung Simulationstechnik. FB 11. Universitiit-GH Siegen Paul-Bonatz-Str. 9-11. D-57068 Siegen. Germany
Abstract: Metabolic carbon labeling experiments for the detennination of intracellular metabolic fluxes have become a standard tool in Metabolic Engineering. A general framework for the evaluation and statistical analysis of labeling data is presented. Because the problem of flux detennination is in general ill-conditioned special measures have to be taken for numerical parameter fitting and statistical analysis. This contribution explains the nature of this ill-conditionedness and a general solution strategy to regularize the problem. The developed methods are then applied to an example which yields some new results concerning the optimal experimental design of labeling experiments. Copyright © 1998 IFA C Keywords : Metabolic engineering, Flux analysis, Carbon labeling, Modeling, Statistical analysis, Optimal experimental design
I. INTRODUCTION
theory to judge the achieved results (van Heijden et al.,1994).
Metabolic flux detennination under stationary conditions has become a standard tool in Metabolic Engineering (Vanna and Palsson, 1994; Zupke and Stephanopoulos, 1995; Marx et al., 1996). It enables all intracellular fluxes in the central metabolism of a microorganism to be quantitated. Thus flux analysis is an invaluable instrument to judge the effect of genetical manipulations aiming at the improvement of the fonnation of a certain product (Marx et aI., 1998). The necessary condition for flux analysis is the presence of a metabolical steady state during the time span of the corresponding experiment. This can be achieved with high accuracy in a continuously driven bioreactor.
Recently, the amount of available measurement data for stationary flux analysis has been dramatically extended by using carbon labeling data in addition to extracellular fluxes (Marx et aI. , 1996). To this end 13C labeled substrates are fed into the system and the enrichment of 13C isotopes in the intracellular metabolite pools are measured by using NMR. This method has the potential to quantify all intracellular fluxes with almost no crucial assumptions on the biological system (Wiechert and de Graaf, 1997). In particular the rates of bidirectional fluxes corresponding to reversible reaction steps can be accessed in addition to the net fluxes (Wiechert and de Graaf, 1997) (Figure I). This infonnation significantly increases the usefulness of flux analysis for Metabolic Engineering because some more infonnation on the regulatory network can now be obtained.
All fluxes between the central metabolic pathways and the cell mass (or the surrounding medium respectively) can then be directly measured by using standard instruments. These fluxes - henceforth called the extracellular fluxes - fonn the data base for stationary flux analysis by the metabolite flux balancing method (Vanna and Palsson, 1994). It is well known now that the application of this method requires some additional assumptions on energy metabolism (e.g. NADH-balancing) that are considered doubtful. On the other hand there is a well established statistical
From a mathematical viewpoint carbon labeling balance equations are much harder to evaluate than metabolite flux balances because the problem of flux estimation now becomes non linear and of a much higher dimension. A general statistical theory for judging the quality of flux estimates has recently been
325
a)
@..y<-M@
b)
@..,,~c:®
The reason for the introduction of a third coordinate system (the numerical flux coordinates) is the illconditionedness of the exchange flux identification problem for given measured data. Because directly measured fluxes are always net fluxes all the information on exchange fluxes must be contained in the labeling data. The important effect that hinders flux estimation from labeling data is the fact that two metabolite pools A, B that are linked by a high exchange flux asymptotically have the same labcling state (rapid equilibrium). This should be intuitively clear but can be also rigorously proven (Wicchert, 1995). The reason for a possible ill-conditioncdness in case of rapid equilibria is explained now.
Fig. I. Alternative characterization of a bidirectional reaction step by a) forward and backward directed fluxes and b) net and exchange fluxes. established (Wiechert and de Graaf, 1997). It turned out that the evaluation of carbon labeling experiments is in general an ill-conditioned numerical problem. This property has major drawbacks on the statistical treatment too. This contribution summarizes the key results and the algorithms that have been developed in order to resolve the condition problem. The application of these methods will then be demonstrated by an example of medium complexity. Finally some new results on the optimal design of carbon labeling experiments will be presented.
To this end consider the fractional amount x [%1 of l3C labeling at a certain carbon atom position of metabolites A or B. Let us assume for the moment that all fluxes in the network are known by value except for the considered exchange flux v xch linked to that pool. It can then be shown by a mathematical analysis of the labeling balance equations (7) that x essentially exposes a hyperbolic dependence on v xc h , i.e. the relationship between exchange flux and labeling state is generally given by a function (Figure 2a)
x = x (v XCh ) = a .
2. EXCHANGE FLUXES
v
xch
v
xch
+b +d +c
(I)
The new feature of carbon labeling experiments is their ability to quantify bidirectional reaction steps. On the other hand it is this new type of flux that introduces the numerical and statistical problems. To explain the nature of this condition problem three different coordinate systems are now introduced for the description of reversing reaction steps. To this end consider a certain molar metabolic flux v: A -t B between two metabolite pools A and B.
with certain constants a , b, c, d depending on all other fluxes in the system. Since these fluxes are assumed to be known these constants can be computed and an exchange flux estimate from the inverse fonnula is obtained by:
From the viewpoint of mode ling it is most convenient to describe the fluxes in this reaction step by aforward and a backward flux v---+ and v+-- (each with unit [mollh)). The reason is that the carbon label flux balances (7) can be directly written by using these variables. They are henceforth called the natural flux coordinates.
As can be seen from Figure 2a the sensitivity of the flux estimate with respect to the measurement error of x tends to infinity if the exchange flux is large, i.e. if the reaction step is close to equilibrium. This is exactly the case if x comes close to its upper limit a + d. As a consequence large exchange fluxes can only be estimated within large confidence bounds. Moreover, an exact confidence interval as presented in Figure 2a is unsymmetric with a much larger right half. If v xc h becomes very large then the problem is not only a statistical but also a numerical one, i.e. the numerical calculation of v xch becomes unstable if x ;:::; a + d.
v
On the other hand the usual application of stationary flux analysis is the quanti tat ion of net fluxes v net . The new infonnation on bidirectionality is merely an additional feature that should be quantitated separately by an exchange flux v xch . This variable describes the amount of flux taking place in both directions. The precise definition of net and exchange flux (henceforth called the application coordinates) is given by
vnet
= V~
-
v
h
vXc =
~
mill ( V
ex - ab - de
=- x-a-d
This is not just a hypothetical situation because it will really happen when fluxes are estimated from measured labeling fractions by using a least squares approach. According to our experience the measured value of x frequently lies outside its theoretically reachable value range R(x) = [abl e + d,a + d] because R (x) can often be a very small interval. If x lies above a + d then the value of v xc h that best reproduces x in a least squares sense is v xch = 00 (see Figure 2a). Consequently, a parameter fitting
<-
,v )
(Figure I). Clearly, this is a one-to-one coordinate transfonnation. Its inverse is given by the fonnulas: v---+ = v xch - mill (_v net , 0) and v+-- = v xch _ mill
xc h
(v net , 0) .
326
a)
c ~
______'-:.ilJljt_____________ .
x =a
0.6
-
b +d c
(3)
~ I .~.:---;-~==ii
u.
As shown by Figure 2b the calculation of v xc h[O, I] from x is no more ill-conditioned. Moreover, if the value range of vxch[O,I] is restricted to the interval [0,1 J in a parameter fitting algorithm then the flux value v xch[O, I] can be computed without numerical difficulties . From this an almost symmetrical exact statistical confidence interval is obtained for vxc h[O, I] as is shown in Figure 2b. Of course, for v xc h[O, I] ::::::: 1 the reverse transfonnation back to v xch will reestablish the large and unsymmetric confidence region. However, this is done after the parameter fitting took place and thus poses no numerical stability problem.
0)
c
CiT
.o.r. r:l
.•J
"C
.-
Cl)
:l
I/)
0.3
r:J
o 1----<>-----1
Estimated Exchange Flux
b)
(b - 1) v xch[O,I ] (c - 1) vxc h[O.I]
§
0.6
------Limit ..
-----_ .. _----
u T.-
...
0:; •
u.
rn c
3. A GENERAL MODEL
=,..
15.1.
Having the different flux coordinate systems and their use in mind a general model for carbon labeling experiments can now be presented that has been developed in (Wiechert and de Graaf, 1997) in great detail. For this reason the equations are only briefly reproduced in this paper:
~
..J
"C
e V;
....(.l ~
0.3
• •- _ -___.........
'---~
0.0
0.2
0.6
1.0 ~
f-<>-I
Estimated [0,1]-Exchange Flux
(I) The state variables are given by the vectors
Fig. 2. Principle of exchange flux detennination from labeling data : a) If [0, ooJ-scaled exchange fluxes (see Figure I) are estimated the problem is in· creasingly ill-conditioned while b) the numerical and statistical problems vanish for [0, I J-rescaled exchange fluxes (Equation 2).
Turning back to the general case where all fluxes have to be estimated simultaneously, this problem will be even worse. In particular, this critical situation cannot be recognized a priori because the value range R (x) cannot be efficiently computed since it depends on all other estimated flux values too . For this reason special measures have to be taken to establish a stable numerical flux estimation procedure. Clearly, illconditionedness is an intrinsic property of the studied system and thus cannot be removed by mathematical transfonnations . However, it can be removed from the numerical computations by pre-conditioning the system . This is done by introducing a nonlinear transformation for exchange fluxes given by:
v
xch v xch
+1
fractional labeling state of all intracellular carbon pools and the carbon atoms fed into the system
v -+ , v +-
forward and backward fluxes
(2) Fluxes are alternatively represented within three different coordinate systems by
algorithm will tend to infinity and eventually will become numerically unstable.
vxch[o. I] d;;,f
x, x inp
natural flux coordinates as forward and backward fluxes
v-+, v+-
application flux coordinates as net and exchange fluxes
v
numerical flux coordinates as net and [0, 1]-exchange fluxes
vnet, vxch[O, I]
net
, v
xch
The respective coordinate transfonnations defining these alternative coordinate systems are given by
(2)
The transfonnation maps the infinite interval [0,00] to the finite interval [0,1 ]. In particular v xch = 00 is mapped to the finite value vxch[O.I] = 1. The inverse . . xh vxch[O ,I] b " h' trans fionnatlOn IS: v c = -;-:::xcn . Su stltutmg t IS I- v tenn into Equation (I) yields:
d;j
(vXCh v xc h _
0)) .
min (_v net , min ( v net, 0)
(5)
Here 0 is the Zero vector and 1 is the vector with all One's. The vector division has to be
327
taken component-wise and likewise the vector minImum. (3) The carbon label flux balance equations exhibit the general structure
( ~ vi' . Pi' + ~ vr .pr ) .x + (
~ vi' . p:np) .
X
inp
Here the expectation vectors are Zero and the covariance matrices are ~w, ~y.
(6)
4. SIMULATION AND STATISTICAL ANALYSIS
= 0
Using the model equations a carbon labeling experiment can be simulated. This means that the expected fluxes and fractional labels can be computed from the assumed free flux values, i.e. for given 8 . To this end the mappings r, q" q,[O,I], 'IJt defined in Equations (7,5,4,10) are composed to obtain the expected measured net flux vector and the expected measured fractional labeling vector:
with the square atom transition coefficient matrices Pi, Pi , and the input atom transition matrices p:nP. Ifv--+, v+- are known the balance equations are linear w.r.t. x . By solving them x tums out to be a (nonlinear) function of v--+ , v+- : (7)
E (w) = Mw' (1
(4) Various assumptions about metabolite fluxes like metabolic stationarity - including the stoichiometric balances, unidirectionality of reaction steps or rapid equilibria caused by large exchange fluxes - are expressed by two linear flux constraint equations
and
Nxch[O,ll.vxch[O,I] = nxch[O,I]
E (y) = My' r
(8)
K
0
q,[O, I1o III (8) d;j Fy (8)
(8) = Ilw-M w . vncll lEw + lly-My . xllE y = Ilw- Fw (8) IIEw + lly-Fy (8)IIEy
Once having computed the free flux estimate 9 its estimated covariance matrix can be calculated by using a standard linearization argument:
(9)
•
Cov (8)-1 ~
8Fw •
T
~;1
+ (8Fy (9))T
~-1 y
(
88 (8) )
88
(10)
(
8Fw .
88 (8) )
(8Fy 88
(9))
From this matrix an approximate parameter confidence region of multidimensional ellipsoidal shape can be computed by using standard statistical methods (Wiechert, 1995). It is illustrated in Figure 2b that the linearization approximation performs quite well for the [0,1] rescaled exchange fluxes (as used in 8), i.e. the exact confidence region is almost identical with its linear approximation. In contrary this does not hold for the original exchange flux as shown in Figure 2a. A principal components analysis of the covariance matrix by singular value decomposition enables the
(6) Finally, the measured flux and labeling data is described by linear measurement equations as
y=My ' x
q,
III (8) d;j Fw (8)
Herein 11.11 E denotes the weighted norm given by 11( 11 E = (T. ~ -1 -(. The result is a rather complicated optimization problem. The numerically stable solution of this problem is described in (Wiechert, 1995). The stability is achieved by comprising the mappings r,
in such a way that Equations (8) combined with Equation (9) admit a unique solution for v net , vxch[O,I] when 8 is given. Consequently, there is an affine function vnet ) ( vxch[O,I] = 'IJt (8)
0
0)·
Here the projection matrix (1 0) is composed from a zero matrix and a unit matrix. Using these predictions the flux estimate 9 is computed by minimizing the weighted least squares function :
where the net flux constraint matrix Nnet , the exchange flux constraint matrix N xch[O,I], and the constraint value vectors nnet, n xch[O,I] are given and fixed . (5) The free flux vector 8 represents the systems degrees of freedom. These are the fluxes that are not determined by Equations (8) and thus have to be estimated from measured data. They are defined by formulating an arbitrarily third constraint equation net N free . (v v xch[O.I] ) = 8
by multivariate normally distributed random vectors cw, Cy :
+Cy
with the measured flux values w, the measured label fractions y and the measurement Matrices Mw , My. The measurement noise is represented
328
principal axes of the confidence ellipsoid to be determined which gives a more detailed insight into the nature of possible correlations in the estimates (Figure 4). More details on the statistical analysis are given in (Wiechert and de Graaf, 1997).
Because the substrate uptake rate upt can be directly measured its measurement error has little influence on the other estimates (Wiechert and de Graaf, 1997) so that upt = 1 can be assumed for simplicity. There remain two free flux values, the cycle net flux ppp~et and the common exchange flux value ppp;~~~~.I). Finally, only the labeling fractions gap2,gaP3 and e4Pl are considered as possible measurement values each with a measurement standard deviation of 0.0 I. It has been shown in (Wiechert and de Graaf, 1997) that these labels indeed contain the most information about the two free fluxes . The following is concerned with the design of experiments that produce optimal flux information. To start with the analysis the influence of the set of measured values on the quality of the flux estimates is studied. This is done for the special case ppp~Cl = 1 and ppp;C~l~. I) = 0.5 and 1_13C xylose as the substrate. Clearly at least two measurements must be taken to determine the fluxes. Thus one might measure a) gaP2' gap3' b) gaP2' e4Pl ' c) gaP3' e4Pl, or d) all three labels. Figure 4 illustrates the outcoming 95% approximate confidence regions for each combination of measurements as determined from the singular value decomposition of the respective covariance matrix . It becomes clear that measuring gap3 and e4Pl yields a result that is almost as good as taking all three measurements gap2' gaP3' e4Pl . All other combinations of two measurements are not acceptable.
identical pools
This shows that the set of measured labeling fractions must be carefully chosen for carbon labeling experiments. If the available information is not sufficient one might try to obtain further labeling information by extracting other labeled material from the cell mass. This can be a tedious procedure so that the method described above help to predict the expected increase of information. In the example it was useless to take gap2 as an additional value.
Fig. 3. Cyclic pentose phosphate pathway example used to illustrate the optimal design of labeling experiments. Details on the carbon atom transitions can be taken from (Wiechert and de Graaf, 1997).
5. A SIMPLE EXAMPLE The only experimental condition that actually can be varied before the labeling experiment is carried out is the input labeling state of the substrate as given by the vector x inp = (xy1l, ... ,xy1S)T. By mixing differently labeled substrates arbitrary input labeling fractions can be produced. In our example only substrates with all positions completely labeled or unlaP beled (i .e. = 0 or 1) are compared. The reader should notice that using the complementary vector 1inp x as input yields exactly the same result as taking x tnp because this essentially means to replace each 13C atom in the system by a 12C and vice versa. Thus only half of the possible candidates for xinp have to be taken into account. Table I presents the results. It turns out that the 1_13C labeling chosen above is indeed the only one times labeled substrate that produces optimal results.
The procedures explained above are now applied to a simple but nontrivial example. Several more complex examples with real measurement data are given in (Marx et at., 1998). The example is concerned with the cyclic pentose phosphate pathway. The network as shown in Figure 3 is taken from (Wiechert and de Graaf, 1997) with exactly the same nomenclature. It is assumed that the fluxes upt, g1Y1, g1Y2, g1Y3, PPP1, are irreversible while PPP2, PPP3, PPP4, take place in both directions. Biomass effiuxes are left out for simplicity. However it can be argued that their influence on the results is rather small (Wiechert and de Graaf, 1997).
xt
For illustration purposes the example is now reduced to only two free fluxes by assuming the exchange fluxes in the transaldolase and transketolase reactions to be equal, i.e. xchlO,I) _ p'PpxchlO,I) _ p'Pp 2 3 -
xchlO,I)
PPP4
d9
Clearly, all these results depend on the free fluxes that are actually assumed. This means that in order to compute an experimental design before the experiment is
xchlO,I)
- PPP2.3 ,4 .
329
Table I . Parameter standard deviations and Fisher information (Takors et a!., 1998) for different choices of input labeling for the network shown in Figure 3. Input Labeling
2 0 0 0 0 0 0 0 0 0
3 0 I
5
4 0 0
I
0 I
PPFl
0 0 0 0
0
Standard Deviation • net • xch[O, I]
to be singular it is not possible to determine all desired fluxes with the given measurement information. Finally, the sketched methods can be easily extended to isotopomer labeling experiments. To this end the carbon labeling fractions x have to be replaced by the isotopomer labeling fractions and the carbon labeling balance has to be replaced by the general isotopomer labeling balance as has been introduced in (Wiechert, 1995). This equation is mathematically much more complex and in particular it is quadratic with respect to x and thus more difficult to solve (Wiechert et aI., 1997). All other ingredients of the general model presented above remain the same. Since numerical methods for the solution of the isotopomer balance equation and the computation of its derivative are now available (Wiechert, 1995; Wiechert et aI., 1997) the same statistical formalism will work .
Information
PPF234
0.070
0.046
95773
0.225
0.048
9329
0 0
0
I
I
0
0
o
all others
carried out a good guess for ppp~et and ppp~C~I~, I] must be supplied a priori. It is well known that this is a common problem for nonlinear experimental design that cannot be avoided (Takors et al., 1998). However, sensitivity studies can show how sensitive the chosen design is with respect to a variation of the free fluxes. For example it has been shown in (Wiechert and de Graaf, 1997) that gap3 and e4Pl are in any case good candidates for the determination of ppp~et and ppp~~~~~.I] A similar study for the choice of the input labeling may be performed.
This will enable carbon labeling experiments and isotopomer labeling experiments to be compared and to decide in which situation it is advisable to switch to the more complex system. It is currently worked on the answer of these questions .
7. REFERENCES Marx, A., A.A. de Graaf. W Wiechert and L. Eggeling (1996). Determination of the fluxes in central metabolism of corynebacterium glutamicum by NMR spectroscopy combined with metabolite balancing. Biotechn. Bioeng. 49, I I 1-129. Marx, A., A.A. de Graaf, W Wiechert and L. Eggeling (1998). Metabolic fluxes in c. glutamicum for different physiological states resolved by 13C isotope analysis . In: This conference. Takors, R., D. Weuster-Botz, w. Wiechert and C. Wandrey (1998). Model discrimination and parameter identification by an experimental design strategy. In: This conference. van Heijden et aI., R.T.1.M . (1994). Linear constraint relations in biochemical reaction systems : Part I+Il. Biotechnol. Bioeng. 43,3-20. Varma, A. and B.O. Palsson (1994). Metabolic flux balancing: basic concepts, scientific and practical use . Biol Technol. 12, 994-998. Wiechert, W (1995). Metabolische KohlenstoffMarkierungssysteme. Habilitationsschrift, Universitat Bonn. Wiechert, Wand A.A . de Graaf (1997) . Bidirectional reaction steps in metabolic networks. Part I+II. Biotechn.Bioeng. 55, 101 - 135. Wiechert, W, M. Mbllney and M. Wurzel (1997). Modelling, analysis and simulation of metabolic isotopomer labelling systems. In: 15th IMACS World Congress, Berlin, August 1997. Zupke, C. and G. Stephanopoulos (I995). Intracellular flux analysis in hybridomas using mass balances and in vitro 13C NMR. Biotechn. Bioeng. 45,292- 303.
XCh[O,1) ppp 2,3,4
0.8
0.5
measured 0.2
measured gly~et 0.7
1.0
1.3
Fig. 4. Approximate confidence regions for the free fluxes obtained for different combinations of measured intracellular labels. If gap 2 ,gap3' e4Pl are all measured, the result is only slightly improved.
6. FUTURE DEVELOPMENTS The techniques that have been introduced above can help to decide how well intracellular fluxes can be determined and which type of experiment will produce optimal results. Moreover, a general tool has been established that enables the identifiability of fluxes to be analyzed given certain measurement data and input labeling. If the computed covariance matrix turns out
330