APPLICATION OF BILINEAR MODELS TO BIOLOGICAL PROBLEMS. AN APPROACH TO MECHANISM INFERENCE AND ANALYSIS Y. Pagnotte and A. Pave Universile Claude Bemard, Lyon, France and GRECO "Analyse des systemes" (CNRS, RhOne Alpes Region) Abstract. Volterra differential equations (i.e. bilinear autonomous models) are often used to modeling biological, and ecologi.cal systems. In the framework of a kinetic theory we find that only a subset of these equations is avalaible to explain such mathematical models in terms of mechanisms. A biological example is studied and shows how we are led to include complementary hypothesis to explain experimental data: "secret variables". Keywords. Biology; Computer aided-design; Data reduction and Analysis Ecology; Identification; Nonlinear equations; systems analysis.
I.
INTRODUCTION So, we shall study both the problem of structure identification (search of possible equivalent mechanism) and that of numerical identification (parameters estimation) with statistical properties.
The importance of bilinear models has been noted in many fields of scientific activity, particularly for process control (Mohler, 1973). Mathematically they are derived from the classical autonomous Volterra models, primarily built to study problems of populations dynamics. These models were looked by the authors - especially by Lotka (1956) as approximations of nonlinear models. But we may note that such an approximation has been formally studied a few years ago (Brockett, 1972). We could show that it is possible to consider them as models derivated from a formal theory : the chemical kinetics theory. This theory is usable to study and explain some biological and also ecological problems. In this point of view, multilinear differential equations (the mathematical model) are considered as obtained from global mechanisms, i.e. descriptive models, written using chemist's language (Pave, 1977).
2. BILINEAR MODELS AND FORMAL KINETICS The formal kinetics in chemistry allows to study quantitative relations between molecules ; it proposes hypothesis on molecular mixture evolution, particularly when the molecular "mixture" is homogeneous in a domain. The hypothesis (or axioms) have been obtained from experimental considerations such as mass action law, and from other theories such as the theory of molecular shocks (cf. appendix for details). As mentioned above, it is possible to consider that other systems, differents of chemical systems, s eem to follow analogous laws. We have proposed to call them "like chemical systems" (Pave and Pagnotte, 1977). They can be found in biochemistry, molecular and cellular biology, and in ecology.
The reduction of a priori multilinear equations to bilinea r equations can be justified both theoretically (in terms of probabilities~ and practically from experimental considerations. So,our goal is to propose, if it is possible, a mechanisms permitting the explanation of sets of bilinear differential equations. The term "mechanism" must be seen as global mechanism or equivalent mechanism in the same way that the definition of equivalent circuits in electronics. The goal of the approach is to obtain "significative" phenomena in the framework of a formal theory. This approach is analogous to some statistical analysis, such as factorial analysis or correspondance analysis which give picturffi of complex sets of data. In our case, the picture is a s e t of productions (or reactions) which summarizes a system work.
All these systems have in common to be built from particles (elementary components). These particles can be grouped in species, r elat i ve to th e ir properties. For example, in ecology, we can consider animals as particles, in microbiology bacteries, in mole cular or cellular biology macromolecules. Note that in a particular system, the definition of particles is not based on scale considerations ; for example, the growth study of microorganisms in a limited artificial environment leads us to consider them as particles, interacting with alimental
165
Y. Pagnotte and A. Pave
166
particles such as glucose molecules. In other terms, particles and related species are defined relatively to their action. A neutral set of particles is not considered (for example, the gelose particles for microorganism growth). Many of these systems can be studied with a good approximation using the differential equations of chemical kinetics (i.e. multilinear differential equations) if we consider only variations of numbers of particles - or concentrations - of each involved species (cf. appendix for general formulation). Besides, if we suppose that the frequency (or probability) of more than two particles interaction is negliglible, then the associa-
t ed differential set of equations is bilinear.
Then the associated differential system is easily obtained (cf. appendix). q d Xl X. E a lj J
j=i+1
Ylij
Suppose that a system is a set of species XI' X2 ' ..• Xq . So we can consider the maXLmum set of possible interrelations (using the chemist's notation)
kl
XI
-
k X
q
q
nil
XI +
...
+ n
n ql
XI +
...
+ n X qq q
X q
lq
(transitions/productions/reproductions). 2 XI
-
k' I
...
n J I XI +
+ n'
Iq
X q ( I)
k'
2 X q
---S
n' XI + ... + n' X ql qq q
(homogeneous interactions/intraspecific competitions or cooperations).
-
k" I
n" XI + II
...
+ n" X Iq q
XI + X3
k" 2 " -n21 XI +
...
+ n"
X q
X. + X. L J
k" -r n r"l XI +
...
+ n"
X q
k"R . + X _ n~1 X XI + q q-I
...
+ n"
X q
XI + X2
2q
rq
Rq
(heterogeneous interactions/inter specific competitions or cooperations). 1
r is the range of the reaction of X. and X. with i
.
r = (i - I)(q -~) + j - i, and the maximum range R is R = q(q - I) 2
ej
X~ + J
q-I E
i=1 (2)
J
l
I, 2,
... ,
q
Where
aU
=
kl(nU - I) and alj
k. n J jl
for l ! b
U
kl (nil. - 2) and blj
(3)
k! njl J
for l ! c
The demonstration of this result, based on axioms and properties given in the appendix, is easy.
j=1
b
x.L x.
A system where this hypothesis is assumed to be true is named : i'es t ricted li ke chemical
system .
q + E
lr of r
= Y , and with the previous definition lij
c. Lr
k (n". - I) c. = k (n . - I) and 'Jr r rJ r rL
c lr
kr n~l
for l ! i, l ! j
The equation (2) is the general expression of a Volterra differential system (i.e. bilinear system). It represents the variation rate of the species X. (i.e. the number of particles or concentrktion) in a r estricted like chemical system. As above me ntioned, such systems appear very often in the reality ; for example, if we consider the actual set of studied chemical reactions, which is very important, only five of them are concerned with three particles (Schaal, 1971). Our approach doesn't integrate the diffusion problem which is generally important. In fact, this problem can be approximated using compartmental systems (Glansdorff and Prigogine, 1971 ; Jacquez, 1972). In this case, the problem is the same that of transition (i.e. the name of a species may be distinguished according to the geographical situation in a compartmental system). 3. STRUCTURE IDENTIFICATION In our sense, the structure identification problem is to propose a set of differential equations from a set of experimental data, giving a good representation of experimental results and then a picture of mechanisms occurring in the system. We assume that we study a restricted like chemical system, then differential equations are bilinear as defined above and can be written from the l ist of components of the system (or variables). So, two complementary approaches can be defined : i) The analytical one consists to verify, if it is possible (i.e. if the properties of solutions of the set of differential equations can be studied), that the mathematical model is coherent. That is to say
Applic~tion
of bilinear models
that there is at less one solution giving a qualitative good representation of the results. ii) The numerical one has to calculate estimators of the terms of matrix DI (cf. section 4). A priori we consider the complete matrix DI (definition in the appendix), then we have a maximal representation. But, generally, all possible mechanisms don't occur in a real system and we propose the elimination of the terms which are not significantly different from zero (For example after statistical considerations). Then, we obtain another representation called minimal representation. The analytical coherence may be studied, but if we have a good estimations of parameters, this condition is generally obtained. Another condition of goodness in the frame of our theory is to have the possibility of mechanism inference ; that is to say : the terms of matrix DI must verify the following conditions, derived from (3) : If DI is denoted DI = [AIBIC]
A
the matrix of linear components,
1S
B is the matrix of square components, C is the matrix of rectangular components, and we must have : a ..
~
0 for i I j, real on the first diagonal
~
0 for i I j, real on the first diagonal
~
0 for l I i and l I j, else real.
1J
b .. 1J
c 1r
We may denote that more conditions should be advanced for column components which are products of integers with a same real number for a given column of DI.In reality, we can consider the case of "competitive reactions" (i.e. reactions with the same left hand sides and different right hand sides). We can show easily that such reactions lead to a global picture
x.1
+ X. J
which summarizes all parallel competitive reactions, but VI' V ' ..• ,V q are real positive 2 numbers. On these bases, a translation algorithm can be drawn to find the description of the system from the matrix DI (i.e. the list of significative transitions or interactions, see for example our paper: Pave and Pagnotte, 1977). Then we obtain a set of "equivalent" reactions which gives a picture of the involved mechanisms occuring in the studied system in the same sense that equivalent circuits in electronics. If the system is not coherent, on the basis of analytical arguments, or if numerical conditions above mentioned are not verified, then two ways may be defined i) to conclude that the system cannot be considered as a restricted like chemical system, and to search other solutions to explain I.S.-
L
167
experimental results. ii) If we have good information on the nature of the system, in the sense that we can consider it a priori as a restricted like chemical system, then the mathematical model must be modified or completed. In some cases, such an approach leads to assume the existence of variables not studied in the experience (cf. our biological example in section 5). Such variables may be named
"secret variables". 4. NUMERICAL IDENTIFICATION A set of chemical or like chemical equations, according to our previous hypothesis, is governed by the bilinear differential system (2). The parameters ali, bt;, Vlij are linear functions of the rate const~nts (or multilinear functions). In a given application, much of them may be equal to zero. An identification algorithm has to compute estimators of these parameters from a set of experimental data. And the identification is achieved when the rate constants are estimated. A lot of nonlinear methods have been suggested for solving this problem. Nimmo and Atkins (1974) proposed a nonlinear regre$ion procedure in the case of an irreversible one substrate enzymatic reaction without product inhibition. In the same time, Fernley (1974) has described a procedure which is unapplicable too in case of competitive product inhibition. This method has the disadvantage that the data are not fitted to the rate equations, but to a polynomial which is a best approximate of them. Darvey et al. (1975) have suggested a procedure for nonlinear regression in the case of a one substrate-one-product system. Under general assumptions, Bellman et al. (1967) transformed the problem of estimating parameters within the equations into one in which they estimate the initial conditions for a system of differential equations. This is performed with the quasi-linearisation technique which needs the evaluation of the Jacobian matrix. But this method is known to be unstable in many cases. Duggleby and Morrison (1977) developed an interesting method. From the differential rate equations, based on the Michaelis-Menten model, they obtain the formal equations giving the partial derivatives of the concentrations with respect to the rate constants to be estimated. So this formula allows to use a Newton-Gauss nonlinear regression procedure. But in some cases, this method can diverge and if the step size is restricted to prevent divergence, this method can be slow. In other cases, it can by-pass the searched minimum and wander off into an unrealistic part of the parameter space. Chandler et al. (1972) proposed a method based on Marquardt's algorithm. Initially,
168
Y. Pagnotte and A. Pave
the method is near to the steepest descent, but when the minimum approaches, it becomes nearly pure Gauss-Newton. However there is no known best method. Wagner et a1. (1975) have given a very efficient algorithm for an optimal choice of the experimental data in order to get a better precision in the resulffi Victor et al. (1973) presented a program finding the best fit of the rate constants in the case of a single step mechanism. The program computes directly the values from the relaxation time and from the initial concentrations ; it requires no progress curve. But the equation (2) can be transformed d X. s ~ L g. fm(t), i = I, 2, ... , q m=1 ~m dt
population (treated to remain younger during a long time). The values q2(t) are related to a control population having a normal evolution. The both curve shapes seem to a graphical representation of a solution of a KostitzinVolterra equation dQ dt
aQ - bQ
dH dt
k'Q
where Q represents the RNA rate, and c=k.k'. Prudhomme (1976) estimated the experimental error on the measurements of ql and q2 as ± 10 %, assumed to be normally distriouted : we obtain estimators precision by simulation. Applying our method to the equation (6), it was found
with: s = q(q + 3) It is possible to obtain the parameters gim fitting the data to the equation (4). But
d X·
numerical evaluation of ~ are very un-
i=I,2, ... ,q
As the functions f (t) are linearly independent, the method c~nsists in fact to ded X'
' on a po l ynom~al base.
5. EXPERIMENTAL RESULTS Let us consider the silk gland act~v~ty of the silkworm. During the last larval instar, it is principally oriented to produce silk proteins. However it is well known that protein synthesis is intracellular RNA-dependent. Prudhomme (1976) has studied the evolution of the rate of RNA in the gland during the last instar of the worm. Experimental results are plotted on fig. I.
(0
0.029)
0.02553 (0
0.061)
0.01420 (0
0.00076)
2
(0
0.067)
0.01514 (0
0.026)
(0
0.062)
0.6390 0.4316
The comparison between experimental data and simulated curves using the estimates of a, b, c show a good agreement (fig. I). For q2' the agreement would be better if the number of experimental points would be greater. The estimates of al and a2 (resp. bl and b2) are not significatively different. But this is not the case for cl and c2' We have shown in a previous paper (Pave and Pagnotte, 1977) that the equation (6) can be looked as the following like chemical system:
-a
b k
Q -
2 Q
(i)
Q +H
(H)
k' Q + H -
H
(iii)
Reaction (i) looks like an autoreproductive limited mechanism for the RNA. But RNA is well known to be not autoreproductive. In fact, the RNA is copied on an autoreproductive molecule : the DNA. And this reaction summarizes these two mechanisms. Reaction (ii) represents the production of a compound H which,in (iii), interacts with Q to degrade it. Other experimentations suggest that H is an RNA degradative enzyme called RNAase.
1
The values ql(t) are concerned with a treared
0.6608
C
Q
So, the identification problem is transformed into a linear system resolution problem. Theoretical results on the reliability of the method (confidence intervals on the estimators, ... ) are not yet available. This procedure is very simple and takes into account the nonlinear form of the equatiQn (2) without linearizing assumptions.
e)
al b
l cl a 2 b 2
2
f
(6)
Q(u) du
o
X I(t) X (t)} qq
reliable. So, the equation (5) is more convenient than equation (4) s t nf (t) dt (5) Xi(t n ) - Xi(t o ) = L t m m=1 o
ft
dQ = aQ - bQ 2 - cQ dt
(4)
2 Xq(t), XI(t) X2 (t), XI(t) X3 (t), ... ,
~ ~
- k.QH
or :
where {fm(t)} is a set of s elementary functions 2 {fm(t) } = {XI (t), ... ,Xq(t), XI(t), ... ,
compose
2
With a gradient method, it was found a = 0.68, b = 0.0299, c = 0.0136.
169
Applications of bilinear models So the difference between cl and c2 suggest two possible mechanisms : - the production of RNA is more important with the control population, - and/or the degradation of the RNA by the enzyme is more important with the control population than with the treated one. Experiments are now led in our laboratory for testing these hypothesis.
APPENDIX
1
Representation of like chemical system
A like chemical system can be denoted as set of "reaction" R written following the implicit syntax used by chemists : R =
~ nil
x~f)
+
...
+ n.
~
6. CONCLUSION Our approach permits to analyse experimental data in terms of mechanisms, it includes statistical considerations on significance of numerical estimations of model parameters. We have pointed out the necessary conditions for a bilinear differential set of equations to be explained in terms of elementary reactions: transitions and interactions.
X(r) + m il i l I
i
~
1.
---+
~
X~r)
+ m.
~r. ~
~r.
~
N r
where r ~s the number of parallel simultaneous "reactions". are the species of particles in the left hand side of the ith reaction.
If a description of a system is seen as a sentence of a language (like chemical language) we have defined implicitly algorithms to translate such a sentence in terms of bilinear differential equations and converscly to translate, under above-mentionned conditions, a "bilinear model" in a mechanism model (this language point of view may be compared to theoretical results of Fliess (1975)). Such a mechanism summarizes the system work, obviously not in detail (as mode~ it is not a copy of reality) ; in this sense we propose to speak of equivalent mechanism, as electronician speaks of equivalent circuit. This development doesn't include explicitely some problems as diffusion, however it is possible to have available approximation using compartmental models (Jacquez, 1972), which are, in fact, particular case of kinetic theory. Conversely, the restriction on parameter values of differential equations permits elimination of some general solutions often admitted in the frame-work of dynamics of populations (Keyfitz, 1968), particularly for the linear t e rms, then the meaning of parameters seems to be more accurate (look, for example, the terminology used in (I)).
'" N
...
i
k.
X(.t) it.
(i~j~t)
i are the species of particles in the right hand side of the i th reaction. (i"",j "", r ) i
a positive integer, is the number of particles of species xi" in the left hand side of the ith re~ction. also a positive integer, is the number of particles in the right hand side of the i th reaction. k.
~
are positive real numbers, they are interpreted as constant rates of the reactions.
Such reacti~ns can be read as interqctions of ni; part~cles of each species X,~) in the left hand s~des (I '" j ~ ti), the ~J disappearance of these particles products mik particles of species X(r) in the right hand ik sides I "'" k "'" r ~..
2 - Associated differential system 2.1, Hypothesis
Finally, we must note that bilinear models give good approximation of nonlinear models (2) so, someone may say that our method cannot give the r ea l mechanism . in fact as we have already mentioned our g~a l is no~ to give an exact description of reality, but a tool to analyse complex situations. For this reason we speak of equivalent mechanism. So we think that our approach can be a usable ' tool to mechanism study and also to detect "secret variables", of which the existence appears necessary to be postulate and included in the mathematical model to explain data through our model analysis. Obviously, the meaning of "secret variables" depends on the practical problem. So it is possible to see this method as a data analysis method, that is to say a method which permits to summarize experimental information and to give some global structure in terms of mechanisms.
. for a reaction i we have the relation
dX~r) ~
m.
~r"
dt
v.
~
~
where Xij are the concentrations (or the numbers) of particles j (at time t) in a system where occurs only reaction i. \~e have used the same relation for the names of species and the related quantity, in this sense they can be seen as identifiers of variables as in programmation languages.
Y. Pagnotte and A. Pave
170
. V. is the rate of the reaction ~
V.
k
~
(X~. l2»ni2
(x (l»nil
il
i
~
If a speci~Xj is present on the both sides of a same re-action then dX .. ~J
~
m .. - n .. ~J
~J
V.
~
If Xj is not in the right hand side (resp. in the left hand side) then m. . = 0 (resp. ~J
n .. = 0) ~J
For Nr parallel simultaneous reactions, the variation rate of X. is : J
N
dX. --.1. dt
r
E
i=1 2.2. Associated differential system of a set of reactions (i.e. a like chemical system). We have found that dX dt
= D V
where X is the vector of species, V the vector of reaction rates and D a matrix of element dij difference of associated coefficient of the specie Xi in the jth reaction: d .• = m. . - n .. ~J
~J
~J
be carefull the columns of D are associated to reactions and lines to compounds. or,
dX st
=DKT
Duggleby R.G., Morrison J.F. (1977). Biochim. Biophys. Acta, 481, 297. Fernley H.N. (1974). EurQ?: J. Biochem., ~, 377. Fliess M. (1975). Un outil algebrique : le! series formelles non comrnutatives. Proc. of the intern. Symp. on math. system----theory, Udine. Glansdorff P., Prigogine I. (1971). Structure stabilite et flucturations. Masson Paris. Jacquez J. (1972). Compartmental system analysis in biology and medicine. Elsevit Keyfitz N. (1968). Introduction to the mathematics of popUlation. Addison. Weslt Lotka A.J. (1956). Elements of mathematical biology. Dover, New York. Mohler R.R. (1973). Bilinear control Processes. Acad. Press. Nimmo I.A. (1974). Biochem. 43, 377. Pave A. (1977). Description languages, a toe for mathematical modeling. First world conf. on math. at the service of man. Barcelone. Pave A., Pagnotte Y. (1977). Comput. in BioI Med., 7,4. Prudhomme-J.C. (1976). Contribution a l'etude de la biosynthese de la soie dans le glande sericigene de Bombyx mori. Ph. D. Thesis Lyon. Schaal R. (1971). La cinetique chimique home gene. P.U.F., Paris. Victor J., Haselkorn D., Pecht I. (1973). Biom. Res., 6, 121. Volterra V. (1931). Le90ns sur la theorie mathematique de la lutte pour la vie. Gautier-Villars, Paris. Wagner M., Czerlinski G., Pring M. (1975). Comput. Bio!. Med., 2., 105. Q
xlO mg 9
If V is seen as the product of a diagonal matrix K with k . as diagonal terms by a vector T where t. is~obviously J
t.
J
6
=
(multi linear terms)
then dX dt
=DT
3
I
DI is the matrix of elements k. d .. , we use this matrix in our paper. J ~J REFERENCES Bellman R., Jacquez J., Kalaba R., Schwimrner S. (1967). Math. Biosc., 1,71. Brockett R.W. (1972). On the algebric structure of bilinear systems. Proc. U.S.A. Italy seminar "variable struct. systems". Sorrento. Chandler J.P., Hill D.E., Spivey H.O. (1072). Comput. Biom. Res., 5, 515. Darvey I.G., Shrager R.: Kohn L.D. (1975). J. BioI. Chem., ~, 4696.
o
5
10
t (days)
experime nt a l points: norma l populati on
•
tr ea t ed populati on simulat ed c urve s Fig . 1 . Acti v it y of the silkgland. Evolution of Q (RNA quantit y) during the l as t l arval instar.