Accuracy of Parameter Sensitivities of DAE Systems using Finite Difference Methods Atiyah Elsheikh ∗ Wolfgang Wiechert ∗∗ Austrian Institute of Technology, Energy Department, Vienna, Austria (e-mail:
[email protected]) ∗∗ IBG-1: Biotechnology, Research Centre of J¨ ulich, J¨ ulich, Germany (e-mail:
[email protected]) ∗
Abstract: This work evaluates the accuracy of Finite Difference (FD) methods when computing parameter sensitivities of badly-scaled DAE systems. Due to their simple implementation, FD methods are commonly favoured especially when the underlying mathematical model is a hard-coded sophisticated simulator. Nevertheless, FD methods may impose serious numerical problems even if FD step sizes and solver tolerances w.r.t. the order of the FD scheme are ideally selected. Judging the precision of the resulting parameter sensitivities is practically difficult. With the availability of powerful Automatic Differentiation (AD) tools for equation-based simulation languages like ADModelica, there is a new possibility to examine step sizes of various FD schemes, solver tolerances and the resulting precision for realistic large scale examples. This can be done by comparing numerical parameter sensitivities with highly precise analytical solutions using direct integration of sensitivity equation systems generated by AD techniques. It is shown with a realistically sized example that FD methods are actually more critical than usually assumed. Keywords: Finite difference methods, differential algebraic equations, sensitivity analysis, automatic differentiation, accuracy, simulation languages. 1. INTRODUCTION In mathematical modelling, it is the whole package of sensitivity analysis, model validation, optimization and others which are needed to gain knowledge. In the context of DAE models, a significant computational part within all these applications is based on the computationally expensive task of computing parameter sensitivities. Formally, consider the DAE system: F (x, ˙ x, p, t) = 0 n
x(t0 ) = x0 (p)
,
(1)
m
where x(t) ∈ R and p ∈ R represent state variables and model parameters, respectively. Required are timedependent parameter sensitivities dx/dp(t) ∈ Rm×n (or a subset of them). According to their mathematical definitions, these quantities describe the impact of a slight change of parameters p on state variables x and are very useful for carrying out the mentioned mathematical tools. A straightforward way for approximating the sensitivities is realized by applying FD schemes to the numerical solution of (1), described in literature as the brute force or the indirect method. In contrary to other known methods such as the direct method (Atherton et al., 1975), Greens function method (Hwang et al., 1978) and the polynomial approximation method (Hwang, 1983), the indirect method does not require manipulation of the underlying model equations and it is applied in a black-box manner. The accuracy of the resulting sensitivities does not only depend on the step-size choices and the order of the selected FD-scheme but it also depends on the precision of
the numerical solution of the DAE-system (1). Therefore, judging the accuracy of these approximated solutions is practically difficult for large-scale DAE systems. Alternatively, direct integration methods which are realized by numerical integration of sensitivity equation systems can be employed. The accuracy of the resulting numerical solution is controlled by the underlying numerical solver. This work presents a way with which the setting choices of FD schemes, i.e. the step-size and the method order can be examined, especially in the context of largescale DAE systems. This is done by comparing the resulting parameter sensitivities of both methods together. This could be rarely done in the past because powerful AD tools for realistically sized highly non linear and badly scaled models were not available. The rest of this work is structured as follows. Section 2 presents the indirect method and lists some potential numerical errors behind this method in the context of DAE systems. Section 3 introduces direct integration methods and summarizes common ways of integrating large sensitivity equation systems. Section 4 presents common AD techniques with which runtime performance computation of parameter sensitivities of large-scale DAE systems become practically possible. Section 5 demonstrates a benchmark for emphasizing the conclusions drawn in section 6.
2. PARAMETER SENSITIVITIES OBTAINED BY FD SCHEMES APPLIED ON NUMERICAL SOLUTIONS It is well-known that even higer order FD methods impose serious numerical accuracy problems if FD settings together with solver tolerance are not carefully selected . Formally, assume that the numerical integration of equation (1) leads to the solution: x¯i,k (p)
=
xi (tk , p) + Ei,k ,
i ∈ {1, 2, ..., n} ,
(2)
k = 0, 1, 2, ..
where |Ei,k | < TolR · |xi (tk , p)| + TolA represents the local error within the approximated solution x ¯i,k to the true solution xi (tk , p) controlled by the given relative tolerance TolR and the absolute tolerance TolA at time step tk . Then by using Taylor series approximation, the required parameter sensitivities can be approximated as: d¯ xi x ¯i,k (p + ej δj ) − x ¯i,k (p) (tk , p) = + O(δj ) dpj δ j + (3) where ej ∈ Rm is the unit vector with the j-th component equal to one and δj is a small scaler value proportional to the value pj (e.g. δj = ξj ·pj by a user defined perturbation factor ξj ). The implementation is straightforward and requires m+1 simulations of the DAE system (1). However, δj needs to be carefully selected in order to produce accurate results. From one side, a very small δj results in numerical errors in arithmetic division. From the other side, the choice of a large δj leads to large numerical errors in the order of O(δj ). In general, a higher-order Central Difference (CD) formula can be also employed (Eberly, 2001). In order to perform a CD formula of order two (i.e O(δj2 )), parameter sensitivities are approximated as the average of equation (3) and the equation: x ¯i,k (p) − x¯i,k (p − ej δj ) d¯ xi (tk , p) = + O(δj ) dpj δj − (4) for each parameter pj . The difference between equation (3) (4) is used as a measure of the influence of the non linearity of the considered model on numerical errors in FD schemes as done in (De Pauw and Vanrolleghem, 2003). In this study, it has been revealed that an optimal perturbation factor ξj is both model-dependent and parameterdependent (an optimal value for ξj is different for each parameter pj ). Applying CD scheme of order two requires 2m simulations. The higher the order of the FD scheme is, the more simulations are required. For instance for an FD scheme of order four such as: d¯ xi 8¯ xi,k (p + ej δj ) − x¯i,k (p + 2ej δj ) (tk , p) = dpj (12δj ) 8¯ xi,k (p − ej δj ) − x¯i,k (p − 2ej δj ) − + O(δj4 ) (12δj )
(5)
4m simulations are required. Additionally, the used solver tolerances Tol{A,R} may impact the precision of the computed derivatives dx/dp as shown as follows. By substituting (2) into equation (3), the true derivative values become:
d¯ xi (tk , p) dpj xi (tk , p + ej δj ) − xi (tk , p) + O(Ei,k ) = + O(δj ) (6) δj dxi Ei,k = (tk , p) + O( ) dpj δj This implies that numerical errors in FD formulas should consider a choice of δj which satisfies: δj ≫ |Ei,k (p)| (7) ∀i = 1, .., n & ∀j = 1, .., m & ∀k = 0, 1, 2, .. Hence an ideal choice of δj should consider the tolerance values TolA,R . From one side, a big tolerance value leads to low solution precision and limits the choices of δj . From the other side, a very small tolerance leads to higher computational efforts and may excessively increase the runtime of the underlying solver. Additionally, an ideal choice for δj is different for each variable xi , time step tk , the chosen tolerance and the parameter values p. However within the standard FD formula (3) • δj is constant through the whole simulation time course for all time steps tk • δj is constant for all variables xi It is not surprising that there are situations shown in section 5 where an ideal choice of δj is not found and FD results are inaccurate. This agrees with similar complaints reported in literature (Brenan et al., 1989) concerning DAE solvers employing FD schemes for computing the Jacobian, although more sophisticated algorithms for FD computation are performed with variable step-sizes δj at each time step tk . 3. DIRECT INTEGRATION OF SENSITIVITY EQUATION SYSTEMS Another way for improving the accuracy is to compute parameter sensitivities analytically. Namely, for DAE systems of the form (1) with differential index less than or equal to one, the corresponding parameter sensitivities can be computed by direct numerical integration of the sensitivity equation system consisting of the original DAE system (1) and the sensitivity equation subsystems obtained by differentiating all equations w.r.t. desired parameters: ∂x0 (p) Fx˙ s˙ i + Fx si + Fpi = 0 , si (t0 ) = (8) ∂pi ∂x where si = for i = 1, 2, ..., m ∂pi The dimension of the whole system is equal to n + mn. The terms Fx˙ , Fx and Fp depend on x solved by the original DAE system (1). Hence, the sensitivity equation subsystems (8) cannot be decoupled from the original DAE system (1). Integration of the sensitivity equation systems is usually inefficient and could be much slower than low order FD schemes. The reason is that the number of iterations required by the DAE solver for maintaining error tolerance may become excessively enlarged. Moreover, high-dimensional Jacobians need to be factorized for the solution process of the underlying Gauss-Newton scheme. Such direct way could be unreliable from runtime performance point of views. Nevertheless, the precision of
the obtained parameter sensitivities can be controlled by the used DAE solver. Alternatively, many other methods for efficient integration of sensitivity equation systems exist and can be classified into two categories in which (1) the sensitivity equation system is externally decoupled into smaller DAE systems for fast successive solution (Atherton et al., 1975; Dickinson and Gelinas, 1976; Dunker, 1978). (2) the Jacobian structure of the whole sensitivity equation system is exploited for improved integration and factorization. Decoupling is done internally at solver’s iteration level (Caracotsios and Stewart, 1985; Maly and Petzold, 1995; Feehery et al., 1997). In this work, a modification of the method of (Dickinson and Gelinas, 1976) is employed. The DAE system (1) together with only one sensitivity equation subsystem (8) are integrated together. This requires the solution of m DAE systems each of size 2n. For the type of models used, further significant runtime performance improvement is achieved by integrating 8 subsystems together. 4. IMPROVING REPRESENTATION OF SENSITIVITY EQUATION SYSTEMS WITH AUTOMATIC DIFFERENTIATION In order to achieve further significant performance improvement, the representation of the sensitivity equation subsystems can be computed by employing AD techniques and tools (Griewank, 2000; Naumann et al., 2004). AD is a methodology that refers to algorithmic techniques for semantic augmentation of numerical programs with additional code for computing derivatives. AD employs common compiler techniques to compute an efficient representation of analytical derivatives without the drawbacks present in explicit differentiation and FD methods. Within explicitly differentiated formulas, many common sub-expressions are excessively evaluated. The following example is given in order to distinguish between AD and explicit differentiation. A typical equation system of a chemical reaction within a large biochemical network model would look similar to: S I S˙ = −v , v = V (9) S+K S+I The corresponding sensitivity subsystem w.r.t. any parameter p using explicit differentiation techniques may become: S˙ p = − vp S I vp = Vp + S+K S +I Sp (S + K) − S (Sp + Kp ) I V + (10) 2 S + I (S + K) S Ip (S + I) − I(Sp + Ip ) V 2 S+K (S + I) The terms S +I, 1/(S +I) and S/(S +I) are evaluated 6, 5 and 4 times at each single iteration within the DAE-solver. Additionally, the same terms appear in all sensitivity equations obtained by differentiating the kinetic equation (9) w.r.t. all parameters.
The key concept behind AD is that any mathematical function expressed as a program, no matter how complex it is, can be decomposed into a limited set of simplified binary equations composed of elementary operations (e.g. +,−,..etc.) and intrinsic functions (e.g. sin and cos). The derivative of each of these elementary operations can be computed by applying the chain rule to combine their local partial derivatives. Consequently, the resulting common sub-expressions are executed only once each iteration. Given equation (9) and using the Abstract Syntax Tree (AST) representation shown in figure 1, the kinetic equation and the corresponding sensitivity equation w.r.t. a set of parameters p1 , p2 , ..., pm can be represented with a gradient object v ′ together with only two temporary variables as: T1 := S + K =⇒ T1′ := S ′ + K ′ T1 := 1/T1 =⇒ T1′ := − T1′ T1 T1 T1 := S T1 =⇒ T1′ := S ′ T1 + S T1′ T2 := S + I =⇒ T2′ := S ′ + I ′ T2 := 1/T2 =⇒ T2′ := − T2′ T2 T2 T2 := S T2 =⇒ T2′ := I ′ T2 + I T2′ T1 := T1 T2 =⇒ T2′ := T1′ T2 + T1 T2′ v := V T2 =⇒ v ′ := V ′ T2 + V T2′ T
where
X ′ = (dX/dp1 , dX/dp2 , ..., dX/dpm ) ∀ variables X and := corresponds to an assignment relationship between an output variable and an (a set of) input variable(s).
Fig. 1. Binary AST of the right hand side of the kinetic equation from (9). Each non-leave node is represented by a reusable temporary variable. The index of the temporary variable refers to the position of that variable in the tree The following table summarizes the number of FLoatingpoint OPerations (FLOPs) for one evaluation of equation (9) and its parameter sensitivities using all presented approaches for sensitivity analysis. # Ops Eq. (9) CD equation (5) CA equation (10) AD example
+ 2 8m 11m 5m
− 1 4m 3m 3m
× 2 8m 9m 9m
/ 2 9m 8m 0
Hence, AD seems to be the most promising approach taking into account that division is computationally the most expensive arithmetic operation. Further advanced topics in AD aim at the efficient use of memory space by
Fig. 2. E i,j of order 2 for 30 parameters and 15 variables with ξj = 0.01 and T ol = 10−12
Fig. 3. E i,j of a state variable xi (t) ∈ [0.01, 0.11] w.r.t. a parameter pj = 0.5 for various tolerances and perturbation factors using FD scheme of order 2.
reusing temporary variables and exploiting potential sparsity in the Jacobian. These topics and further performance perspectives are out of the scope of this work, algorithmic details are given in (Griewank, 2000). In summary, with AD techniques employed within advanced specialized integration methods computation of parameter sensitivities of large-scale DAE systems become possible. ADModelica: Automatic differentiation of Modelica models The study carried out in this work is applied on models implemented by the common equation-based objectoriented language Modelica (Fritzson, 2003). The Modelica language was initiated as a standardized specification for DAE-based models adequate for model exchange, multidisciplinary modelling and fast physical prototyping (Elmqvist and Mattsson, 1997). It relies on universal modelling concepts with which the task of modelling becomes the matter of dragging, dropping and connecting icons together (Elmqvist, 1978). Meanwhile, the modeller does not need to pay attention at mathematical details (e.g. index reduction of DAE systems with higher differential index). Common Modelica simulation environments are then responsible for compiling the high-level specification into efficient simulation code (Fritzson and et al., 2002; Fritzson et al., 2002; Nytsch-Geusen and et al., 2005) . For the task of computing parameter sensitivities of Modelica models, the tool ADModelica has been employed (Elsheikh et al., 2008; Elsheikh and Wiechert, 2008). ADModelica uses Modelica-based compiler techniques for computing sensitivity equations as a Modelica model. Modelica simulation environments are used for generating simulation code for computing parameter sensitivities. 5. BENCHMARK As a benchmark, a DAE model corresponding to typical biochemical reaction networks describing the central
Fig. 4. E i,j of a state variable xi (t) ∈ [18332.37, 18336.73] w.r.t. a parameter pj = 0.5 for various tolerances and perturbation factors using FD scheme of order 2. metabolism of the Coryne bacterium is used. The system with 173 equations and 268 parameters is directly implemented in the Modelica language and is automatically generated by the software framework described in (Tillack et al., 2009). The simulation environment Dymola (www.dymola.com) is used for simulating the Modelica model relying on the well-known DASSL and LSODAR solvers. In this environment relative tolerance and absolute tolerance are equal by default and can be set by the user. The equal tolerances are denoted by Tol. The accuracy of FD schemes of order 1, 2 and 4 is examined by comparing the corresponding results with parameter sensitivities computed by ADModelica according to the Average Relative Error (ARE) criteria: ¯i N ∂xi (t ) − ∂ x X ∂pj (tk ) ∂pj k i,j E = (11) /N ∂x i ∂p (tk ) k=1
j
Figure 2 shows ARE for a subset of parameter sensitivities using reasonable choices for FD-schemes. While a subset of parameter sensitivities can be accurately reproduced by
Table 1. The values of the chosen parameters pj and variables xi as well as the chosen solver tolerance Tol and the corresponding FD step size δj in the performed benchmark Cases Tol δj pj |xi | ∈
a 10−4 10−2
b 10−4 10−3
c 10−2 10−1
d 10−5 10−4 1.0 [10−2 , 102 ]
e 10−12 10−4
f 10−12 10−6
g 10−12 10−3 ∗ pj < 10−3 > 105
h 10−12 10−3 ∗ pj > 103 < 10−5
a
b
c
Fig. 5. AD vs. FD with the settings Tol = 10−4 and δj = ξj = 10−2 .
Fig. 6. analogously with the settings Tol = 10−4 and δj = ξj = 10−3 .
Fig. 7. Settings Tol = 10−2 and δj = ξj = 10−1 .
d
e
f
Fig. 8. Settings Tol = 10−5 and δj = ξj = 10−4 .
Fig. 9. Settings Tol = 10−12 and δj = ξj = 10−4 .
Fig. 10. Settings Tol = 10−12 and δj = ξj = 10−6 .
g
h
Fig. 11. A numerically difficult case with Tol = 10−12 and δj = 10−3 .
Fig. 12. A numerically difficult case with Tol = 10−12 and δj = 10−3 .
FD, some of the parameter sensitivities are inaccurate. Employing FD-scheme of order 4 does not improve the accuracy. By taking a closer look into some of the inaccurate parameter sensitivities, it can be shown that there is no unique relevant choice appropriate for all parameter sensitivities. Figure 3 demonstrates the influence of solver tolerances on the accuracy of a specific approximated parameter sensitivity. While some FD-Scheme choices have produced reasonable results, some of these choices were not reliable for computation of another derivative w.r.t. the same parameter as shown in figure 4. In this figure, the accuracy of the corresponding parameter sensitivity depends largely on the choice of the perturbation factor. An ideal choice of a perturbation factor is not the same among all variables w.r.t. that specific parameter.
Figures 5-12 show a comparison of parameter sensitivities computed using the direct and indirect methods with FD schemes of order 1, 2 and 4 applied on numerical solutions generated by the LSODAR solver. The values of the chosen parameters and variables as well as the used settings of some critical cases are summarized in table 5. The chosen benchmark reveals some situations where FD results are not accurate as shown in all cases except figure 9 (case e). This behaviour can be explained by equation (6) which implies that parameter sensitivities could be reasonably approximated by avoiding: • big Tol (e.g. figures 5 and 7, cases a and c) • δj ≈ Tol (e.g. figures 6 and 8, cases b and d) • small δj (e.g. figure 10 case f)
However even with apparently reasonable setting choices, figures 11 and 12 (cases g and h) still show inaccurate behaviour of FD methods with parameters and variables varying within extremely different ranges of values, typical for reasonably large DAE models. This implies that within badly-scaled DAE systems, there would be boundary cases where an ideal choice of δj which • does not violate inequality (7) for any variable xi at any time step tk • is not small or not large enough for avoiding numerical errors becomes practically difficult. In the considered benchmark, by applying FD-scheme oforder one and two for different values of xi ∈ 10−1 , 10−5 and with T OL = 10−12 , only less than 60%,70% of the numerical parameter sensitivities could be reproduced with E i,j < 0.01 ∀xi , pj using FD schemes of order 1,2 respectively. By applying FD-scheme of order four, no significant improvement is achieved. 6. CONCLUSION This study encourages the employment of analytical techniques for computing parameter sensitivities of large-scale DAE-systems whenever possible, especially if accurate results of high-precision are effectively important. Inaccurate parameter sensitivities resulting from FD methods may cause some difficulties when performing mathematical applications. For instance termination criteria may not be satisfied by derivative-based optimization algorithms. Consequently, conclusions about parameter identifiability through Monte Carlo analysis of parameter estimation could be misleading. That is, FD methods should be avoided. Nevertheless, for hard-coded DAE-based simulators for which FD-schemes represent the best for computing parameter sensitivities, selection of FD settings need to be done with care. Even in this case, the accuracy can not be directly examined. ACKNOWLEDGEMENTS We acknowledge Stephan Noack and Jana Tillack for sharing biochemical reaction network models of the central metabolism of the Coryne bacterium. REFERENCES Atherton, R.W., Schainker, R.B., and Ducot, E. (1975). On the statistical sensitivity analysis of models for chemical kinetics. AIChE Journal, 21(3), 441–448. Brenan, K.E., Campbell, S.L., and Petzold, L.R. (1989). Numerical Solution of Initial-Value Problems in Differential-Algebraic Equations. North-Holland, New York, USA. Caracotsios, M. and Stewart, W.E. (1985). Sensitivity analysis of initial value problems with mixed ODEs and algebraic equations. Computers and Chemical Engineering, 9(4), 359–365. De Pauw, D.J. and Vanrolleghem, P.A. (2003). Practical aspects of sensitivity analysis for dynamic models. In MATHMOD 2003: The 4th Vienna International Conference on Mathematical Modelling. Vienna, Austria. Dickinson, R. and Gelinas, R. (1976). Sensitivity analysis of ordinary differential equation systems - a direct
method. Journal of Computational Physics, 21, 123– 143. Dunker, A.M. (1978). The decoupled direct method for calculating sensitivity coefficients in chemical kinetics. The Journal of Chemical Physics, 81, 2385–2393. Eberly, D. (2001). Derivative approximation by finite differences. Technical report, Geometric tools, LLC. Elmqvist, H. (1978). A structured model language for large continuous systems. Ph.D. thesis, Lund Institute of Technology, Lund, Sweden. Elmqvist, H. and Mattsson, S.E. (1997). Modelica - the next generation modeling language: An international design effort. In ESS97: The 9th European Simulation Symposium. Passau, Germany. Elsheikh, A., Noack, S., and Wiechert, W. (2008). Sensitivity analysis of Modelica applications via automatic differentiation. In Modelica’2008: The 6th International Modelica Conference. Bielefeld, Germany. Elsheikh, A. and Wiechert, W. (2008). Automatic sensitivity analysis of DAE-systems generated from equationbased modeling languages. In C.H. Bischof, H.M. B¨ ucker, P.D. Hovland, U. Naumann, and J. Utke (eds.), Advances in Automatic Differentiation, 235–246. Springer. Feehery, W.F., Tolsma, J.E., and Barton, P.I. (1997). Efficient sensitivity analysis of large-scale differentialalgebraic systems. Applied Numerical Mathematics, 25(1), 41–54. Fritzson, P. and et al. (2002). The open source Modelica project. In Modelica’2002: The 2nd International Modelica Conference. Munich, Germany. Fritzson, P. (2003). Principles of Object-Oriented Modeling and Simulation with Modelica. Wiley-IEEE Computer Society Press. Fritzson, P., Ounnarsson, J., and Jirstr, M. (2002). MathModelica - An extensible modeling and simulation environment with integrated graphics and literate programming. In Modelica’2002: The 2nd International Modelica Conference. Munich, Germany. Griewank, A. (2000). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Number 19 in Frontiers in Appl. Math. SIAM, Philadelphia, PA. Hwang, J.T., Dougherty, E.P., Rabitz, S., and Rabitz, H. (1978). The Green’s function method of sensitivity analysis in chemical kinetics. The Journal of Chemical Physics, 69(11), 5180–5191. Hwang, J.T. (1983). Sensitivity analysis in chemical kinetics by the method of polynomial approximations. International Journal of Chemical Kinetics, 15(10), 959–987. Maly, T. and Petzold, L.R. (1995). Numerical methods and software for sensitivity analysis of differential-algebraic systems. Applied Numerical Mathematics, 20, 57–79. Naumann, U., Utke, J., and Walther, A. (2004). An introduction to developing and using software tools for automatic differentiation. In P. Neittaanm¨aki, T. Rossi, S. Korotov, E.O. nate, J. P´eriaux, and D. Kn¨orzer (eds.), ECCOMAS 2004: The 4th European Congress on Computational Methods in Applied Sciences and Engineering. University of Jyv¨askyl¨a, Jyv¨askyl¨a, Finland. Nytsch-Geusen, C. and et al. (2005). MOSILAB: Development of a Modelica based generic simulation tool supporting model structural dynamics. In Modelica’2005:
The 4th International Modelica Conference. Hamburg, Germany. Tillack, J., Noack, S., N¨ oh, K., Elsheikh, A., and Wiechert, W. (2009). A software framework for modeling and simulation of dynamic metabolic and isotopic systems. In MATHMOD 2009: The 6th Vienna International Conference on Mathematical Modelling. Vienna, Austria.