Copyright © IFAC Computer Applications in Biotechnology, Garmisch·Partenkirchen. Gennany, 1995
INTERACTIVE EVALUATION OF NMR SPECTRA FROM IN VIVO ISOTOPE LABELLING EXPERIMENTS R. Wittig-, M. MOllney-: W. Wiechert- and A.A. de Graaf' -l..,i'.'e oJ Bio'eebolOf', Ru ..fd entre Jilid, 'f.411 Jilid, Genii •• , -·Dqarlme.' oJ Tleo",ieal BiolOfJ. U.i.eni" oJ Bo... Ki,ul.11. 1. 1311' Bo... Genft •• ,
At-tract. 1. witlO iIIcKope 1abeDjn, aperimeDia are of pea& impanaDc:e for the qn_tjfica&aoa eX metabolic flw:.. The frac&icmal1ahelUn, eX iDt.racellular metabolit., t.ha& haa k) be meuured b flUI .tim&&ion, mua& be compu~ from &11 in nvo NMR ~ by determination of peak areu. Thia tuk ia extremely diflicul& becau.e peak parameter. are unlmown UDder DOll .tandard meuuremem condition.. Moreow:r owzlappin& peak. occur frequeuily _d low ai&n&l k) aoi8e ratioa are encountered. A complemem&ly Kt of ~al analym tooIa ia PreM:ll~, t.ha& CAll IIOlve the llpeCt.ral decompoaition problem. The prop-am DOSIS imep-~ th_ tooIa ink) &11 interactive mEtware &amework.
Key Worda. Spectrum analyaia, Imeractive prolJ'ADW, Parameter .tim&&ion, Couatrained par.. meter., Appnmimation, NMR ~py, Minure analyaia, Peak deconvolutiOD
1. INTRODUcnON
Nuclear magnetic resonance (NMR) studies are of increasing importance for the study of microbial metabolism. They allow to observe metabolic p~ rametel1l in vivo without any disturbance of cell function. The use of 13C isotope labelling experiments for intracellular flux quantification has long been demonstrated for mammalian organs or cell extracts of microorganisms. Only recently experimental methods have been developed to apply NMR to flux determination for microorganisms in continuous culture (Wiechert et al., 1994). This cultivation allows to establish and to characterise different well defined physiological states.
.. b
Q:)
S4
f ~ _y ____s_:...t~s-~---~--c"..-S~ ~ :~~ency •
1.1. Ped Quantification
The quantification of intracellular fluxes requires to know the fractional labelling of intracellular metabolites with respect to certain isotopes. 1 H, :11 P and l:1C are of main interest for NMR application in biochemistry. While :l1 P hu a natural abundance of 100.0%, l:1C has only 1.1 % and must therefore be supplied with an isotopically enriched substrate. Fig. 1 shows schematically how a NMR spectrum is compoeed from resonance peW ofthe contributing carbon atoms. The total amount of label at a certain atom position corresponds to the area under the 8880ciated spectral peak. The task of spectrum evaluation is to quantify all peaks that can be identified. Alter calibration of the areas, the fractional labelling can be computed thereof and serve as input data
230
height
Fig. L Decomposition of a 13C NMR mixture lpectrum iDto mgle reson&Dce peW of dif[erent metabolitee R,S,T.
for flux determination (see also Wiechert et. al., (1994».
1.2 . Pro6lerru of Biological Samplu When NMR spectroscopy is applied to known pure chemical substances and the measurement talcs place under well defined standard conditions, all resonance frequencies are exactly known in advance. Spectral evaluation poses no problem in this situation. However in vivo NMR measurements exhibit several problems that make spectral deconvolution much more complicated:
1. Samples from in vivo experimenta are usually mixtUl'ell from leveral metabolitee. Their concentratioDB may be widely differing. With an inereaaing number of peW, overlap8 become frequent. 2. When a measurement is made inside a bioreactor, the spectrum shows broad lines (de Gr&&f' et 0/., 1992). The conaequence is the occurrence of even more complex compoIlitioDB of overlapping peaks. The peak width can not in general be predicted a priori. 3. There are BOme limiting factors for the me&surement duration in vivo: i) Labelled subIItrates are extremely expensive. ii) The biologie&lsystem may change with time. ill) An iaotopically instationary state may be observed. The consequence of a limited measurement duration is a low signal-to-noise ratio. 4. The peak p08itioDB depend on additional factors (like the intracellular pH value), that cannot be controlled. It is difficult to establish a calibration standard for this deviation.
2.1. Liauhapu Each angle ruoaance ped in the spectrum can be approximated by a linear combination of two lineshapes (lorentsian and gaWl8ian) that are shown in Fig. 2. The corresponding generator terms are given by (Gala:, 1986):
+L (x) = +-1' +G (x) = e-·' . x +
(1)
The gaU88ian lineshape emerges from the lorentDan lineshape by using reeolution enhancement techniques (Lorents to GaU88 tr&DBformation). The lorentzian and gaU88ian parts of all reeonance peaks are henceforth called t.he eiementarr peG!..
1.3. Method. for Peak DeconlJolution
2.2. Synthetic Spectrum
In this situation computer aided methods to identify and disentangle the overlapping peaks are required to gain a maximum of quantitative information. Because of the widely varying types of in vivo experiments that are performed, it is not recommend able to workout on a completely automated solution to this problem. For thiB reason the approach is to supply a complementary &et of tools for spectral deconvolution and integrate them into an interactive software framework. The user will then be able to decompose a complex spectrum in short time by combining the appropriate methods. This paper describes such a tool!let and the program DOSIS that integrates them with a graphical user interface.
A spectrum - && a complex superp08ition of 10rentzian and gaWl8ian peaks - can now be explained by a linear combination of appropriately scaled elementary peaks (Fig. 2) :
Herein the peak heights are denoted by h" the widths by Wi and the frequencies by lIi . Clearly the desired peak areas can be computed, when ht, Wi are known. The model equation (2) is called the 6vnthetic Ipectrum. With the combined parameter vector
a:
where
11
h W'
= (h1' ... , hn f = (Wl, ... ,wft)T
11
=
(Ill, .•• ,
IIft)T
it will be denoted by S(a,z). Similarly the vector representing the whole spectrum is given by
2. GENERAL SPECTRUM MODEL
An NMR spectrum is computed by a fast Fourier tr&DBformation. The measured spectrum then coDBists of a data vector M = (M (1), ... , M (2 N
(
h)
W'
)f .
(for convenience the frequency argument runs over the integer numbers). Fortunately the lineshapee cA NMR peaks are known from spectral theory. The measured spectrum M can t.herefore be explained by a parametric model. This model is diBeWl8ed now in some detail.
231
S (a) = (S (a, 1), S (a, 2), ... , Sea , ~»T Assuming an additive normally distributed white noise vector c, the working hypothesis to explain the measured spectrum is given by
M=S(a)+t. 2.3 . Peak Fitting
In principle, the t&&k of spectrum analysis presents itself &8 a parameter estimation problem. The
leaat squares estimator cl of the true parameter is given by
widths:
0
= =
(3) where IIvll 2 = vT . v. When applied to a mixture spectrum, this estim.... tion problem is much too complex for a numerical 101ution: Between 20 and 200 parameters must be fitted to up to 16.384 data points. Moreover the peak fitting problem is known to be ill determined even in situations when only a few peaks overlap (Dovi et al., 1988) .
3. SPECTRAL DECONVOLUTION
"1- 110 , •• = 0, ... , K Wo
1.
For weakly scalar coupled spin systems the peak heights &re related by binomial coefficients
ht
= ( ~ ) .ha ,i =0, ... , K .
3. The experimentator may wish to control the estimation procedure by fixing certain par .... meters (ht const, Wj const, Ilk const) or the relative position of two peaks from one IUbatance (ht = hj) .
=
=
=
Additional information must be utilised to 801ve the problem. It can be taken from prior knowledge about parameters, physical constraints or results from mathematical systems analysis. Each technique discUBBed within this section aims at convergence and determinacy improvement for a p .... rameter fitting algorithm. More details can be taken from Mollney (1995) :
Fig. 3. Different types of peak multipleta.
3.1. Phr.ical Corutrainu
The linear relations arising from a)-c) can be compriaed by using matrix notation:
Based on physical law8, many quantitative relations between spectral parameters are known. They can be expressed using linear equations, thereby reducing considerably the degree of freedom:
1. It has already been mentioned, that each reIOnance peak is linearly compoeed from two lineshapes (1). This can be expressed by 4LG
(z)
= hL
· 4de~~L)
together with the linear constraints WL
= ~.
WG,
ilL
=
IIG
hL
=i·
3.2 . Separation of Linear Parameter.
=
=
minh,w,.. IIS(h,w , II)-MIF~ minw, .. minh liS (h, w, 11) - MII2 InlDw ,., S (w, 11)
with S (w, 11) being the explicitly known solution of the inner minimization problem obtained by application of a generalized inverae matrix. This measure considerably reduces the dimensi
3.3. Fut Compvtation br Tabv/ation Any optimization procedure must repeatedly eva.luate the term
K "' % - 11' = 'L...J ht · 4. (--')
.=0
They are numerically treated by singular value decomposition of L to express a in terms of some lower dimensional vector a.
hG .
The constant i is known herein, the factor v'iii2 scales the half-maximal points to the aame position. 2. When there are scalar couplings between neighbouring atoms, that are both labelled, a carbon resonance will split up to aeveral pea.k.s (Fig. 3). They &re symmetrically grouped around a central frequency. Such a peak group is called a multiplet. The same type of equation holds for the lorentzian as well as the gaussian part:
~Mult (%)
Quartet
For least squares optimization problems, linear parameters can be aeparated from nonlinear ones. In our case the height parameters h; are linear, 80 that (3) reduces to
h G .4G (e.. o ) "0
+
Doublet
W.
The physical relations for a multiplet &re given by equidistant frequencies and equal
232
= =
IIS(o) - MII2 (S (a) - M)T . (S (a) - M) S (a)TS (a) - 2 S (a)TM + ~M
and ita derivatives with respect to a. The computation of STS and it. derivatives requires 2N . (1 + dim a) evaluationa of lineahape functiona per optimisation step which can be extremely time conauming. Now using equation (2) we get
=
S(a)TS(a) ~ . ~ . J. . A . ~N
/..". '-'J'"
J
i...J .. l
+.• (.-"j )+J.(.-"j ) .; Wj
An approximation for the inner
IUID
is:
The function 9' (., t) can be precomputed and ta.bulated. It has been shown that uaing this technique, the computing time can be reduced by several orden of magnitude (Mollney, 1995). A aimilar measure can be taken for the derivatives with reaped to a and the term S (a)TM, while M"M ia a conatant.
of data analyaia by a synthetic spectrum. All parameters thereof can be accellled via a hierarchically structured graphic:&l user interface (compare to Fig. 1). The available apectral analysis methods can be picked from a toolbox and run on the data or part of it. Each method that can be parametrised in a purely graphical way, is supported by an appropriate visualisation (model-view-control architecture). E.g. the IItarting values for parameter fitting can be pven by interactive positioning and resising of lineahapee. DOSIS stores prior knowledge in an object oriented data base. All linear constraint. can be Ipecified by the U8er in a graphical way. Zero regiona of the apectrum are detected automatically. All methods described above have recently been implemented and integrated into the lIystem. An improvement in computation speed by several orders of magnitude compared to standard methods has been verified.
5. CONCLUSION
3.4. Re.,riding Ihe Region In a spectral region without peaks, the apectral values are reduced to pure noise. Such regiona contain no information for parameter fitting . They can be automatically identified and excluded from the data by using smoothing techniques. Similarly it can be managed to concentrate data analysis on certain peak groups.
3.5. Prior Know/edge Prior knowledge about spectral parameters can be stored in a database that containa frequencies under IItandardized measurement conditiona, values from literature, results from previous experiments under similar conditions or known calibration curves for pH induced llhifts. Using such a database, many peaks of an in vivo spectrum can be qua.litatively identified. Moreover the lIupplied data can be used as Itarting values for the parameter fitting algorithm. A spectral database will be of increaaing usefulneas because it accumulates the knowledge of the experimentators.
4. INTERAcrrvE OATA ANALYSIS The program OOSIS illata Base Oriented spectral Intensity Analysis ~oftware) was designed to integrate all mentioned techniques into an interactive framework. OOSIS runa on a UNIX worbtation with Motif as graphical user interface. It has been implemented in C++ by extensive use of object oriented technology. The central data structure of OOSIS represents the current state
233
A powerful spectral decomposition program (00SIS) has been developed, that allows to quantify overlapping peab in NMR IIpectra from in vivo experiments. It integrates a set of sophisticated complementary tools, that allow to fully exploit the given information. OOSIS has been applied for the evaluation of various NMR mixture apectra. It proved to speed up considerably the time required for data analysis, compared to other available tools. Currently a method for automatic initial value determination is under development. It is based on results from an experimental design study in combination with an asaociative memory module. Further work on automatic peak identification by using the data base is intended. 8. REFERENCES de Graal, A.A., R.M. Witti&, U. Probet, J. Strobbickn-, S.M. Schobcrth ADd H. Sahm (1992). Cominuou. Sow NMR bioreador far i. .tudi. of mi~ bial cell .u.paW0Da with low biOlDAM concentraUaa.. J .M.,.ct.Ruo •. "I, ~". Dovi, V., E. Arato, L. Map., P. Ca-ueri d'Oro ADd R. s..atm (1988). Determin&tiao of ~ved peak. Yia a DeW nonlineazo recr-ion procedure. A ••. C_im.
".0
TI, 5~M2. Gal.&t, A. (1986). Computer-aided analyai. of infrared, c:in:ular dic:hroian aDd abeorption 1lpeCtra. CABIOS 2(3}, 201-205. MOllne)" M. (1995). Rapid peak deconvolution for ~ mi..nure ~ (in serman). Diploma Them, Uni~ aitr of BODD. Wiec:bert, W., A.A. de Graal ADd A. Man: (1W.) . In vivo .&.atioDAl)' flux cietermin&tiao uainc 13 C NMR ~ Lope labclln& experimeuta. ID: Com,.'" A"lic.,iotU i. Bio,ccbololf, CAB I. 'I'hi8 oonCcrence.